date:20200409

Re: cleaning perl code

2020-04-09 Thread Andrew Dunstan



On 4/9/20 2:26 PM, Peter Eisentraut wrote:
> On 2020-04-09 19:47, Robert Haas wrote:
>> On Thu, Apr 9, 2020 at 11:44 AM Andrew Dunstan
>>  wrote:
>>> We currently only run perlcritic at severity level 5, which is fairly
>>> permissive. I'd like to reduce that, ideally to, say, level 3, which is
>>> what I use for the buildfarm code.
>>>
>>> But let's start by going to severity level 4.
>>
>> I continue to be skeptical of perlcritic. I think it complains about a
>> lot of things which don't matter very much. We should consider whether
>> the effort it takes to keep it warning-clean has proportionate
>> benefits.
>
> Let's see what the patches look like.  At least some of the warnings
> look reasonable, especially in the sense that they are things casual
> Perl programmers might accidentally do wrong.



OK, I'll prep one or two. I used to be of Robert's opinion, but I've
come around some on it.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Robert Haas

On Thu, Apr 9, 2020 at 7:49 AM Tomas Vondra
 wrote:
> It it really any different from our enable_* GUCs? Even if you do e.g.
> enable_sort=off, we may still do a sort. Same for enable_groupagg etc.

I think it's actually pretty different. All of the other enable_* GUCs
disable an entire type of plan node, except for cases where that would
otherwise result in planning failure. This just disables a portion of
the planning logic for a certain kind of node, without actually
disabling the whole node type. I'm not sure that's a bad idea, but it
definitely seems to be inconsistent with what we've done in the past.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: where should I stick that backup?

2020-04-09 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> I think we need to step back and look at the larger issue.  The real
> argument goes back to the Unix command-line API vs the VMS/Windows API. 
> The former has discrete parts that can be stitched together, while the
> VMS/Windows API presents a more duplicative but more holistic API for
> every piece.  We have discussed using shell commands for
> archive_command, and even more recently, for the server pass phrase.  

When it comes to something like the server pass phrase, it seems much
more reasonable to consider using a shell script (though still perhaps
not ideal) because it's not involved directly in ensuring that the data
is reliably stored and it's pretty clear that if it doesn't work the
worst thing that happens is that the database doesn't start up, but it
won't corrupt any data or destroy it or do other bad things.

> To get more specific, I think we have to understand how the
> _requirements_ of the job match the shell script API, with stdin,
> stdout, stderr, return code, and command-line arguments.  Looking at
> archive_command, the command-line arguments allow specification of file
> names, but quoting can be complex.  The error return code and stderr
> output seem to work fine.  There is no clean API for fsync and testing
> if the file exists, so that all that has to be hand done in one
> command-line.  This is why many users use pre-written archive_command
> shell scripts.

We aren't considering all of the use-cases really though, in specific,
things like pushing to s3 or gcs require, at least, good retry logic,
and that's without starting to think about things like high-rate systems
(spawning lots of new processes isn't free, particularly if they're
written in shell script but any interpreted language is expensive) and
wanting to parallelize.

> This brings up a few questions:
> 
> *  Should we have split apart archive_command into file-exists, copy,
> fsync-file?  Should we add that now?

No..  The right approach to improving on archive command is to add a way
for an extension to take over that job, maybe with a complete background
worker of its own, or perhaps a shared library that can be loaded by the
archiver process, at least if we're talking about how to allow people to
extend it.

Potentially a better answer is to just build this stuff into PG- things
like "archive WAL to s3/GCS with these credentials" are what an awful
lot of users want.  There's then some who want "archive first to this
other server, and then archive to s3/GCS", or more complex options.

I'll also point out that there's not one "s3"..  there's quite a few
alternatives, including some which are open source, which talk the s3
protocol (sadly, they don't all do it perfectly, which is why we are
talking about building a GCS-specific driver for gcs rather than using
their s3 gateway, but still, s3 isn't just 'one thing').

> *  How well does this backup requirement match with the shell command
> API?

For my part, it's not just a question of an API, but it's a question of
who is going to implement a good and reliable solution- PG developers,
or some admin who is just trying to get PG up and running in their
environment..?  One aspect of that is being knowledgable about where all
the land mines are- like the whole fsync thing.  Sure, if you're a PG
developer or you've been around long enough, you're going to realize
that 'cp' isn't going to fsync() the file and therefore it's a pretty
high risk choice for archive_command, and you'll understand just how
important WAL is, but there's certainly an awful lot of folks out there
who don't realize that or at least don't think about it when they're
standing up a new system and instead they just are following our docs
with the expectation that those docs are providing good advice.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread James Coleman

On Thu, Apr 9, 2020 at 3:05 PM Peter Geoghegan  wrote:
>
> On Wed, Apr 8, 2020 at 10:56 PM Masahiko Sawada
>  wrote:
> > Here is the reproducer:
>
> What version of Postgres did you notice the actual customer issue on?
> I ask because I wonder if the work on B-Tree indexes in Postgres 12
> affects the precise behavior you get here with real world workloads.
> It probably makes _bt_killitems() more effective with some workloads,
> which naturally increases the likelihood of having multiple FPI issued
> in the manner that you describe. OTOH, it might make it less likely
> with low cardinality indexes, since large groups of garbage duplicate
> tuples tend to get concentrated on just a few leaf pages.

We saw the issue on our PG11 clusters. The specific index we noticed
in the wal dump (I don't think we confirmed if there were others) as
one on a `created_at` column, to give you an idea of cardinality.

> > The inner test in the comment "found the item" never tests the item
> > for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> > condition. But there still is a race condition of recording multiple
> > FPIs can happen. Maybe a better solution is to change the lock to
> > exclusive, at least when wal_log_hints = on, so that only one process
> > can run this code -- the reduction in concurrency might be won back by
> > the fact that we don't wal-log the page multiple times.
>
> I like the idea of checking !ItemIdIsDead(iid) as a further condition
> of killing the item -- there is clearly no point in doing work to kill
> an item that is already dead. I don't like the idea of using an
> exclusive buffer lock (even if it's just with wal_log_hints = on),
> though.

I don't have a strong opinion on the lock.

James

Re: Parallel copy

2020-04-09 Thread Andres Freund

Hi,

On April 9, 2020 4:01:43 AM PDT, Amit Kapila  wrote:
>On Thu, Apr 9, 2020 at 3:55 AM Ants Aasma  wrote:
>>
>> On Wed, 8 Apr 2020 at 22:30, Robert Haas 
>wrote:
>>
>> > - The portion of the time that is used to split the lines is not
>> > easily parallelizable. That seems to be a fairly small percentage
>for
>> > a reasonably wide table, but it looks significant (13-18%) for a
>> > narrow table. Such cases will gain less performance and be limited
>to
>> > a smaller number of workers. I think we also need to be careful
>about
>> > files whose lines are longer than the size of the buffer. If we're
>not
>> > careful, we could get a significant performance drop-off in such
>> > cases. We should make sure to pick an algorithm that seems like it
>> > will handle such cases without serious regressions and check that a
>> > file composed entirely of such long lines is handled reasonably
>> > efficiently.
>>
>> I don't have a proof, but my gut feel tells me that it's
>fundamentally
>> impossible to ingest csv without a serial line-ending/comment
>> tokenization pass.

I can't quite see a way either. But even if it were, I have a hard time seeing 
parallelizing that path as the right thing.

>I think even if we try to do it via multiple workers it might not be
>better.  In such a scheme,  every worker needs to update the end
>boundaries and the next worker to keep a check if the previous has
>updated the end pointer.  I think this can add a significant
>synchronization effort for cases where tuples are of 100 or so bytes
>which will be a common case.

It seems like it'd also have terrible caching and instruction level parallelism 
behavior. By constantly switching the process that analyzes boundaries, the 
current data will have to be brought into l1/register, rather than staying 
there.

I'm fairly certain that we do *not* want to distribute input data between 
processes on a single tuple basis. Probably not even below a few hundred kb. If 
there's any sort of natural clustering in the loaded data - extremely common, 
think timestamps - splitting on a granular basis will make indexing much more 
expensive. And have a lot more contention.

>> The current line splitting algorithm is terrible.
>> I'm currently working with some scientific data where on ingestion
>> CopyReadLineText() is about 25% on profiles. I prototyped a
>> replacement that can do ~8GB/s on narrow rows, more on wider ones.

We should really replace the entire copy parsing code. It's terrible.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Parallel copy

2020-04-09 Thread Robert Haas

On Thu, Apr 9, 2020 at 2:55 PM Andres Freund  wrote:
> I'm fairly certain that we do *not* want to distribute input data between 
> processes on a single tuple basis. Probably not even below a few hundred kb. 
> If there's any sort of natural clustering in the loaded data - extremely 
> common, think timestamps - splitting on a granular basis will make indexing 
> much more expensive. And have a lot more contention.

That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.

Or are you are arguing that there should be only one process that's
allowed to find line endings for the entire duration of the load?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel copy

2020-04-09 Thread Andres Freund

Hi, 

On April 9, 2020 12:29:09 PM PDT, Robert Haas  wrote:
>On Thu, Apr 9, 2020 at 2:55 PM Andres Freund 
>wrote:
>> I'm fairly certain that we do *not* want to distribute input data
>between processes on a single tuple basis. Probably not even below a
>few hundred kb. If there's any sort of natural clustering in the loaded
>data - extremely common, think timestamps - splitting on a granular
>basis will make indexing much more expensive. And have a lot more
>contention.
>
>That's a fair point. I think the solution ought to be that once any
>process starts finding line endings, it continues until it's grabbed
>at least a certain amount of data for itself. Then it stops and lets
>some other process grab a chunk of data.
>
>Or are you are arguing that there should be only one process that's
>allowed to find line endings for the entire duration of the load?

I've not yet read the whole thread. So I'm probably restating ideas.

Imo, yes, there should be only one process doing the chunking. For ilp, cache 
efficiency, but also because the leader is the only process with access to the 
network socket. It should load input data into one large buffer that's shared 
across processes. There should be a separate ringbuffer with tuple/partial 
tuple (for huge tuples) offsets. Worker processes should grab large chunks of 
offsets from the offset ringbuffer. If the ringbuffer is not full, the worker 
chunks should be reduced in size.  

Given that everything stalls if the leader doesn't accept further input data, 
as well as when there are no available splitted chunks, it doesn't seem like a 
good idea to have the leader do other work.

I don't think optimizing/targeting copy from local files, where multiple 
processes could read, is useful. COPY STDIN is the only thing that practically 
matters.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Jeff Davis

On Thu, 2020-04-09 at 15:26 -0400, Robert Haas wrote:
> I think it's actually pretty different. All of the other enable_*
> GUCs
> disable an entire type of plan node, except for cases where that
> would
> otherwise result in planning failure. This just disables a portion of
> the planning logic for a certain kind of node, without actually
> disabling the whole node type. I'm not sure that's a bad idea, but it
> definitely seems to be inconsistent with what we've done in the past.

The patch adds two GUCs. Both are slightly weird, to be honest, but let
me explain the reasoning. I am open to other suggestions.

1. enable_hashagg_disk (default true):

This is essentially there just to get some of the old behavior back, to
give people an escape hatch if they see bad plans while we are tweaking
the costing. The old behavior was weird, so this GUC is also weird.

Perhaps we can make this a compatibility GUC that we eventually drop? I
don't necessarily think this GUC would make sense, say, 5 versions from
now. I'm just trying to be conservative because I know that, even if
the plans are faster for 90% of people, the other 10% will be unhappy
and want a way to work around it.

2. enable_groupingsets_hash_disk (default false):

This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.

Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.

With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.

But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?

Regards,
Jeff Davis

Re: debian bugrept involving fast default crash in pg11.7

2020-04-09 Thread Justin Pryzby

On Thu, Apr 09, 2020 at 02:36:26PM -0400, Tim Bishop wrote:
> SELECT attrelid::regclass, * FROM pg_attribute WHERE atthasmissing;
> -[ RECORD 1 ]-+-
> attrelid  | download
> attrelid  | 22749
> attname   | filetype

But that table isn't involved in the crashing query, right ?
Are data_stage() and income_index() locally defined functions ?  PLPGSQL ??
Do they access the download table (or view or whatever it is) ?

Thanks,
-- 
Justin

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-04-09 Thread Robert Haas

[ belatedly responding ]

On Sat, Feb 29, 2020 at 3:17 PM Andres Freund  wrote:
> My preliminary conclusion is that it's simply not safe to do
> SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> to defer the SnapshotResetXmin() call until at least
> CommitTransactionCommand()? Outside of that there ought (with exception
> of multi-transaction commands, but they have to be careful anyway) to be
> no "in progress" sequences of related catalog lookups/modifications.
>
> Alternatively we could ensure that all catalog lookup/mod sequences
> ensure that the first catalog snapshot is registered. But that seems
> like a gargantuan task?

If I understand correctly, the scenario you're concerned about is
something like this:

(1) Transaction #1 reads a catalog tuple and immediately releases its snapshot.
(2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple.
(3) Transaction #3 completes a VACUUM on the table, so that the old
tuple is pruned, thus marked dead, and then the TID is marked unused.
(4) Transaction #4 performs an INSERT which reuses the same TID.
(5) Transaction #1 now performs a DELETE or UPDATE using the previous
TID and updates the unrelated tuple which reused the TID rather than
the intended tuple.

It seems to me that what is supposed to prevent this from happening is
that you aren't supposed to release your snapshot at the end of step
#1. You're supposed to hold onto it until after step #5 is complete. I
think that there are fair number of places that are already careful
about that. I just picked a random source file that I knew Tom had
written and found this bit in extension_config_remove:

extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
 NULL, 1, key);

extTup = systable_getnext(extScan);
...a lot more stuff...
CatalogTupleUpdate(extRel, >t_self, extTup);

systable_endscan(extScan);

Quite apart from this issue, there's a very good reason why it's like
that: extTup might be pointing right into a disk buffer, and if we did
systable_endscan() before the last access to it, our pointer could
become invalid. A fair number of places are protected due to the scan
being kept open like this, but it looks like most of the ones that use
SearchSysCacheCopyX + CatalogTupleUpdate are problematic.

I would be inclined to fix this problem by adjusting those places to
keep a snapshot open rather than by making some arbitrary rule about
holding onto a catalog snapshot until the end of the command. That
seems like a fairly magical coding rule that will happen to work in
most practical cases but isn't really a principled approach to the
problem. Besides being magical, it's also fragile: just deciding to
use a some other snapshot instead of the catalog snapshot causes your
code to be subtly broken in a way you're surely not going to expect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Justin Pryzby

On Thu, Apr 09, 2020 at 01:48:55PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote:
> > On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:
> > > The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
> > > costing. If false, it only generates a HashAgg path if it thinks it will 
> > > fit
> > > in work_mem, similar to the old behavior (though it wlil now spill to 
> > > disk if
> > > the planner was wrong about it fitting in work_mem).  The current default 
> > > is
> > > true.
> > 
> > Are there any other GUCs that behave like that ?  It's confusing to me when 
> > I
> > see "Disk Usage: ... kB", despite setting it to "disable", and without the
> > usual disable_cost.  I realize that postgres chose the plan on the 
> > hypothesis
> > that it would *not* exceed work_mem, and that spilling to disk is considered
> > preferable to ignoring the setting, and that "going back" to planning phase
> > isn't a possibility.
> 
> It it really any different from our enable_* GUCs? Even if you do e.g.
> enable_sort=off, we may still do a sort. Same for enable_groupagg etc.

Those show that the GUC was disabled by showing disable_cost.  That's what's
different about this one.

Also.. there's no such thing as enable_groupagg?  Unless I've been missing out
on something.

> > "This setting determines whether the planner will elect to use a hash plan
> > which it expects will exceed work_mem and spill to disk.  During execution,
> > hash nodes which exceed work_mem will spill to disk even if this setting is
> > disabled.  To avoid spilling to disk, either increase work_mem (or set
> > enable_hashagg=off)."
> > 
> > For sure the release notes should recommend re-calibrating work_mem.
> 
> I don't follow. Why would the recalibrating be needed?

Because HashAgg plans which used to run fine (because they weren't prevented
from overflowing work_mem) might now run poorly after spilling to disk (because
of overflowing work_mem).

-- 
Justin

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Jeff Davis

On Thu, 2020-04-09 at 12:24 -0500, Justin Pryzby wrote:
> Also.. there's no such thing as enable_groupagg?  Unless I've been
> missing out
> on something.

I thought about adding that, and went so far as to make a patch. But it
didn't seem right to me -- the grouping isn't what takes the time, it's
the sorting. So what would the point of such a GUC be? To disable
GroupAgg when the input data is already sorted? Or a strange way to
disable Sort?

> Because HashAgg plans which used to run fine (because they weren't
> prevented
> from overflowing work_mem) might now run poorly after spilling to
> disk (because
> of overflowing work_mem).

It's probably worth a mention in the release notes, but I wouldn't word
it too strongly. Typically the performance difference is not a lot if
the workload still fits in system memory.

Regards,
Jeff Davis

Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-04-09 Thread Alvaro Herrera

On 2020-Mar-11, Tom Lane wrote:

> > thanks for it) to backbranches or just to master.  It seems legitimate
> > to see it as a feature addition, but OTOH the overall feature is not
> > complete without it ...
> 
> 0003 is the command addition to allow removing such a dependency,
> right?  Given the lack of field demand I see no reason to risk
> adding it to the back branches.

Yeah, okay.

I hereby request permission to push this patch past the feature freeze
date; it's a very small one that completes an existing feature (rather
than a complete new feature in itself), and it's not intrusive nor
likely to break anything.

> BTW, I did not like the syntax too much.  "NO DEPENDS ON EXTENSION"
> doesn't seem like good English.  "NOT DEPENDS ON EXTENSION" is hardly
> any better.  The real problem with both is that an ALTER action should
> be, well, an action.  A grammar stickler would say that it should be
> "ALTER thing DROP DEPENDENCY ON EXTENSION ext", but perhaps we could
> get away with "ALTER thing DROP DEPENDS ON EXTENSION ext" to avoid
> adding a new keyword.  By that logic the original command should have
> been "ALTER thing ADD DEPENDS ON EXTENSION ext", but I suppose it's
> too late for that.

I will be submitting a version with these changes shortly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: where should I stick that backup?

2020-04-09 Thread Bruce Momjian

On Mon, Apr  6, 2020 at 07:32:45PM +0200, Magnus Hagander wrote:
> On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost  wrote:
> > For my 2c, at least, introducing more shell commands into critical parts
> > of the system is absolutely the wrong direction to go in.
> > archive_command continues to be a mess that we refuse to clean up or
> > even properly document and the project would be much better off by
> > trying to eliminate it rather than add in new ways for users to end up
> > with bad or invalid backups.
> 
> I think the bigger problem with archive_command more comes from how
> it's defined to work tbh. Which leaves a lot of things open.
> 
> This sounds to me like a much narrower use-case, which makes it a lot
> more OK. But I agree we have to be careful not to get back into that
> whole mess. One thing would be to clearly document such things *from
> the beginning*, and not try to retrofit it years later like we ended
> up doing with archive_command.
> 
> And as Robert mentions downthread, the fsync() issue is definitely a
> real one, but if that is documented clearly ahead of time, that's a
> reasonable level foot-gun I'd say.

I think we need to step back and look at the larger issue.  The real
argument goes back to the Unix command-line API vs the VMS/Windows API. 
The former has discrete parts that can be stitched together, while the
VMS/Windows API presents a more duplicative but more holistic API for
every piece.  We have discussed using shell commands for
archive_command, and even more recently, for the server pass phrase.  

To get more specific, I think we have to understand how the
_requirements_ of the job match the shell script API, with stdin,
stdout, stderr, return code, and command-line arguments.  Looking at
archive_command, the command-line arguments allow specification of file
names, but quoting can be complex.  The error return code and stderr
output seem to work fine.  There is no clean API for fsync and testing
if the file exists, so that all that has to be hand done in one
command-line.  This is why many users use pre-written archive_command
shell scripts.

This brings up a few questions:

*  Should we have split apart archive_command into file-exists, copy,
fsync-file?  Should we add that now?

*  How well does this backup requirement match with the shell command
API?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: debian bugrept involving fast default crash in pg11.7

2020-04-09 Thread Tim Bishop

I've attached a file containing the \d+ for all the tables involved and 
the EXPLAIN ANALYZE for the query.


Yes, the crash happened under v11.6.  I had tried downgrading when I 
first encountered the problem.


While trying to put together this information the crash started 
happening less frequently (I was only able to reproduce it it twice and 
not in a row) and I am unable to confirm if  SET 
max_parallel_workers_per_gather=0 had any effect.


Also since I've been able to reproduce I'm currently unable to provide a 
corefile or backtrace.  I'll continue to try and reproduce the error so 
I can get one or the other.


I did find a work around for the crash by making the view 
(v_capacity_score) a materialized view.


Thanks
tim

On 3/28/20 6:30 PM, Justin Pryzby wrote:

I happened across this bugreport, which seems to have just enough information
to be interesting.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953204
|Version: 11.7-0+deb10u1
|2020-03-05 16:55:55.511 UTC [515] LOG:  background worker "parallel worker" 
(PID 884) was terminated by signal 11: Segmentation fault
|2020-03-05 16:55:55.511 UTC [515] DETAIL:  Failed process was running:
|SELECT distinct student_prob.student_id, student_prob.score, student_name, 
v_capacity_score.capacity
|FROM data JOIN model on model.id = 2 AND data_stage(data) = 
model.target_begin_field_id
|JOIN student_prob ON data.crm_id = student_prob.student_id AND model.id = 
student_prob.model_id AND (student_prob.additional_aid < 1)
|LEFT JOIN v_capacity_score ON data.crm_id = v_capacity_score.student_id AND 
student_prob.model_id = v_capacity_score.model_id
|WHERE data.term_code = '202090' AND student_prob.score > 0
|ORDER BY student_prob.score DESC, student_name
|LIMIT 100 OFFSET 100 ;

Tim: it'd be nice to get more information, if and when possible:
  - "explain" plan for that query;
  - \d for the tables involved: constraints, inheritence, defaults;
  - corefile or backtrace; it looks like there's two different crashes (maybe 
same problem) so both would be useful;
  - Can you reprodue the crash if you "SET max_parallel_workers_per_gather=0" ?
  - Do you know if it crashed under v11.6 ?

If anyone wants to hack on the .deb:
https://packages.debian.org/buster/amd64/postgresql-11/download and (I couldn't 
find the dbg package anywhere else)
https://snapshot.debian.org/package/postgresql-11/11.7-0%2Bdeb10u1/#postgresql-11-dbgsym_11.7-0:2b:deb10u1

$ mkdir pg11
$ cd pg11
$ wget -q 
http://security.debian.org/debian-security/pool/updates/main/p/postgresql-11/postgresql-11_11.7-0+deb10u1_amd64.deb
$ ar x ./postgresql-11_11.7-0+deb10u1_amd64.deb
$ tar xf ./data.tar.xz
$ ar x postgresql-11-dbgsym_11.7-0+deb10u1_amd64.deb
$ tar tf data.tar.xz
$ gdb usr/lib/postgresql/11/bin/postgres

(gdb) set debug-file-directory usr/lib/debug/
(gdb) file usr/lib/postgresql/11/bin/postmaster
(gdb) info target

If I repeat the process Bernhard used (thanks for that) on the first crash in
libc6, I get:

(gdb) find /b 0x00022320, 0x0016839b, 0xf9, 0x20, 0x77, 0x1f, 
0xc5, 0xfd, 0x74, 0x0f, 0xc5, 0xfd, 0xd7, 0xc1, 0x85, 0xc0, 0x0f, 0x85, 0xdf, 
0x00, 0x00, 0x00, 0x48, 0x83, 0xc7, 0x20, 0x83, 0xe1, 0x1f, 0x48, 0x83, 0xe7, 
0xe0, 0xeb, 0x36, 0x66, 0x90, 0x83, 0xe1, 0x1f, 0x48, 0x83, 0xe7, 0xe0, 0xc5, 
0xfd, 0x74, 0x0f, 0xc5, 0xfd, 0xd7, 0xc1, 0xd3, 0xf8, 0x85, 0xc0, 0x74, 0x1b, 
0xf3, 0x0f, 0xbc, 0xc0, 0x48, 0x01, 0xf8, 0x48
0x15c17d <__strlen_avx2+13>
warning: Unable to access 1631 bytes of target memory at 0x167d3d, halting 
search.
1 pattern found.

I'm tentatively guessing that heap_modify_tuple() is involved, since it calls
getmissingattr and (probably) fill_val.  It looks like maybe some data
structure is corrupted which crashed two parallel workers, one in
fill_val()/strlen() and one in heap_deform_tuple()/getmissingattr().  Maybe
something not initialized in parallel worker, or a use-after-free?  I'll stop
guessing.

Justin





  Table 
"public.ra_data"
 Column |  Type  | 
Collation | Nullable | Default | Storage  | Stats target | Description 
++---+--+-+--+--+-
 tgt_accepted   | boolean|  
 | not null | false   | plain|  | 
 tgt_active_paid| boolean|  
 | not null | false   | plain|  | 
 tgt_application_completed  | boolean|  
 | not null | false   | plain|  | 
 tgt_application_started| boolean|  
 | not null | false   | plain|  | 
 tgt_cancelled_paid | boolean|  
 | not null | false   | plain|

Re: debian bugrept involving fast default crash in pg11.7

2020-04-09 Thread Tim Bishop


SELECT attrelid::regclass, * FROM pg_attribute WHERE atthasmissing;
-[ RECORD 1 ]-+-
attrelid  | download
attrelid  | 22749
attname   | filetype
atttypid  | 1043
attstattarget | -1
attlen| -1
attnum| 5
attndims  | 0
attcacheoff   | -1
atttypmod | 36
attbyval  | f
attstorage| x
attalign  | i
attnotnull| t
atthasdef | t
atthasmissing | t
attidentity   |
attisdropped  | f
attislocal| t
attinhcount   | 0
attcollation  | 100
attacl|
attoptions|
attfdwoptions |
attmissingval | {csv}


On 4/9/20 2:31 PM, Justin Pryzby wrote:

On Thu, Apr 09, 2020 at 02:05:22PM -0400, Tim Bishop wrote:

I've attached a file containing the \d+ for all the tables involved and the
EXPLAIN ANALYZE for the query.


Thanks for your response.

Do you know if you used the "fast default feature" ?
That would happen if you did "ALTER TABLE tbl ADD col DEFAULT val"

I guess this is the way to tell:
SELECT attrelid::regclass, * FROM pg_attribute WHERE atthasmissing;

Re: Parallel copy

2020-04-09 Thread Robert Haas

On Thu, Apr 9, 2020 at 7:49 AM Dilip Kumar  wrote:
> I agree that if the leader switches the role, then it is possible that
> sometimes the leader might not produce the work before the queue is
> empty.  OTOH, the problem with the approach you are suggesting is that
> the work will be generated on-demand, i.e. there is no specific
> process who is generating the data while workers are busy inserting
> the data.

I think you have a point. The way I think things could go wrong if we
don't have a leader is if it tends to happen that everyone wants new
work at the same time. In that case, everyone will wait at once,
whereas if there is a designated process that aggressively queues up
work, we could perhaps avoid that. Note that you really have to have
the case where everyone wants new work at the exact same moment,
because otherwise they just all take turns finding work for
themselves, and everything is fine, because nobody's waiting for
anybody else to do any work, so everyone is always making forward
progress.

Now on the other hand, if we do have a leader, and for some reason
it's slow in responding, everyone will have to wait. That could happen
either because the leader also has other responsibilities, like
reading data or helping with the main work when the queue is full, or
just because the system is really busy and the leader doesn't get
scheduled on-CPU for a while. I am inclined to think that's likely to
be a more serious problem.

The thing is, the problem of everyone needing new work at the same
time can't really keep on repeating. Say that everyone finishes
processing their first chunk at the same time. Now everyone needs a
second chunk, and in a leaderless system, they must take turns getting
it. So they will go in some order. The ones who go later will
presumably also finish later, so the end times for the second and
following chunks will be scattered. You shouldn't get repeated
pile-ups with everyone finishing at the same time, because each time
it happens, it will force a little bit of waiting that will spread
things out. If they clump up again, that will happen again, but it
shouldn't happen every time.

But in the case where there is a leader, I don't think there's any
similar protection. Suppose we go with the design Vignesh proposes
where the leader switches to processing chunks when the queue is more
than 75% full. If the leader has a "hiccup" where it gets swapped out
or is busy with processing a chunk for a longer-than-normal time, all
of the other processes have to wait for it. Now we can probably tune
this to some degree by adjusting the queue size and fullness
thresholds, but the optimal values for those parameters might be quite
different on different systems, depending on load, I/O performance,
CPU architecture, etc. If there's a system or configuration where the
leader tends not to respond fast enough, it will probably just keep
happening, because nothing in the algorithm will tend to shake it out
of that bad pattern.

I'm not 100% certain that my analysis here is right, so it will be
interesting to hear from other people. However, as a general rule, I
think we want to minimize the amount of work that can only be done by
one process (the leader) and maximize the amount that can be done by
any process with whichever one is available taking on the job. In the
case of COPY FROM STDIN, the reads from the network socket can only be
done by the one process connected to it. In the case of COPY from a
file, even that could be rotated around, if all processes open the
file individually and seek to the appropriate offset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-09 Thread Alexey Kondratov


On 2020-04-09 16:33, Tom Lane wrote:

Fujii Masao  writes:

On 2020/04/09 16:11, Kyotaro Horiguchi wrote:
At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane  
wrote in

Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place?


The rationale for not being a fmgr function is stated in the 
following

comments. [...]


This issue happens because the function is executed after BEGIN? If 
yes,
what about executing the function (i.e., as separate transaction) 
before BEGIN?
If so, the snapshot taken in the function doesn't affect the 
subsequent

transaction whatever its isolation level is.


I wonder whether making it a procedure, rather than a plain function,
would help any.



Just another idea in case if one will still decide to go with a separate 
statement + BEGIN integration instead of a function. We could use 
parenthesized options list here. This is already implemented for VACUUM, 
REINDEX, etc. There was an idea to allow CONCURRENTLY in REINDEX there 
[1] and recently this was proposed again for new options [2], since it 
is much more extensible from the grammar perspective.


That way, the whole feature may look like:

WAIT (LSN '16/B374D848', TIMEOUT 100);

and/or

BEGIN
WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT);
...
COMMIT;

It requires only one reserved keyword 'WAIT'. The advantage of this 
approach is that it can be extended to support xid, timestamp, csn or 
anything else, that may be invented in the future, without affecting the 
grammar.


What do you think?

Personally, I find this syntax to be more convenient and human-readable 
compared with function call:


SELECT pg_wait_for_lsn('16/B374D848');
BEGIN;


[1] 
https://www.postgresql.org/message-id/aad2ec49-5142-7356-ffb2-a9b2649cdd1f%402ndquadrant.com


[2] 
https://www.postgresql.org/message-id/20200401060334.GB142683%40paquier.xyz



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: cleaning perl code

2020-04-09 Thread Robert Haas

On Thu, Apr 9, 2020 at 11:44 AM Andrew Dunstan
 wrote:
> We currently only run perlcritic at severity level 5, which is fairly
> permissive. I'd like to reduce that, ideally to, say, level 3, which is
> what I use for the buildfarm code.
>
> But let's start by going to severity level 4.

I continue to be skeptical of perlcritic. I think it complains about a
lot of things which don't matter very much. We should consider whether
the effort it takes to keep it warning-clean has proportionate
benefits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Alvaro Herrera

On 2020-Apr-09, Masahiko Sawada wrote:

> The inner test in the comment "found the item" never tests the item
> for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> condition. But there still is a race condition of recording multiple
> FPIs can happen. Maybe a better solution is to change the lock to
> exclusive, at least when wal_log_hints = on, so that only one process
> can run this code -- the reduction in concurrency might be won back by
> the fact that we don't wal-log the page multiple times.

I agree.

It seems worth pointing out that when this code was written, these hint
bit changes were not logged, so this consideration did not apply then.
But we added data checksums and wal_log_hints, which changed the
equation.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: debian bugrept involving fast default crash in pg11.7

2020-04-09 Thread Justin Pryzby

On Thu, Apr 09, 2020 at 02:05:22PM -0400, Tim Bishop wrote:
> I've attached a file containing the \d+ for all the tables involved and the
> EXPLAIN ANALYZE for the query.

Thanks for your response.

Do you know if you used the "fast default feature" ?
That would happen if you did "ALTER TABLE tbl ADD col DEFAULT val"

I guess this is the way to tell:
SELECT attrelid::regclass, * FROM pg_attribute WHERE atthasmissing;

-- 
Justin

Re: BUG #16345: ts_headline does not find phrase matches correctly

2020-04-09 Thread Jeff Janes

redirected to hackers.

On Wed, Apr 8, 2020 at 11:02 PM Tom Lane  wrote:

>
> In short then, I propose applying 0001-0006.  I'm not quite sure
> if we should back-patch, or just be content to fix this in HEAD.
> But there's definitely an argument that this has been broken since
> we added phrase search (in 9.6) and deserves to be back-patched.
>
>
Thanks for fixing this.

I am getting a compiler warning, both with and without --enable-cassert.

wparser_def.c: In function 'prsd_headline':
wparser_def.c:2530:2: warning: 'pose' may be used uninitialized in this
function [-Wmaybe-uninitialized]
  mark_fragment(prs, highlightall, bestb, beste);
  ^
wparser_def.c:2384:6: note: 'pose' was declared here
  int   pose,


It makes no sense to me that pose could be used uninitialized on a line
that doesn't use pose at all, so maybe it is a compiler bug or something.

PostgreSQL 13devel-c9b0c67 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit

Cheers,

Jeff

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Peter Geoghegan

On Wed, Apr 8, 2020 at 10:56 PM Masahiko Sawada
 wrote:
> Here is the reproducer:

What version of Postgres did you notice the actual customer issue on?
I ask because I wonder if the work on B-Tree indexes in Postgres 12
affects the precise behavior you get here with real world workloads.
It probably makes _bt_killitems() more effective with some workloads,
which naturally increases the likelihood of having multiple FPI issued
in the manner that you describe. OTOH, it might make it less likely
with low cardinality indexes, since large groups of garbage duplicate
tuples tend to get concentrated on just a few leaf pages.

> The inner test in the comment "found the item" never tests the item
> for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> condition. But there still is a race condition of recording multiple
> FPIs can happen. Maybe a better solution is to change the lock to
> exclusive, at least when wal_log_hints = on, so that only one process
> can run this code -- the reduction in concurrency might be won back by
> the fact that we don't wal-log the page multiple times.

I like the idea of checking !ItemIdIsDead(iid) as a further condition
of killing the item -- there is clearly no point in doing work to kill
an item that is already dead. I don't like the idea of using an
exclusive buffer lock (even if it's just with wal_log_hints = on),
though.

-- 
Peter Geoghegan

Re: BUG #16345: ts_headline does not find phrase matches correctly

2020-04-09 Thread Tom Lane

Jeff Janes  writes:
> I am getting a compiler warning, both with and without --enable-cassert.

> wparser_def.c: In function 'prsd_headline':
> wparser_def.c:2530:2: warning: 'pose' may be used uninitialized in this
> function [-Wmaybe-uninitialized]
>   mark_fragment(prs, highlightall, bestb, beste);
>   ^
> wparser_def.c:2384:6: note: 'pose' was declared here
>   int   pose,

I see it too, now that I try a different compiler version.  Will fix.

> It makes no sense to me that pose could be used uninitialized on a line
> that doesn't use pose at all, so maybe it is a compiler bug or something.

It looks like the compiler is doing aggressive inlining, which might
have something to do with the crummy error report placement.  Notice
that this isn't inside 'prsd_headline' at all, so far as the source code
is concerned.

regards, tom lane

Re: cleaning perl code

2020-04-09 Thread Peter Eisentraut


On 2020-04-09 19:47, Robert Haas wrote:

On Thu, Apr 9, 2020 at 11:44 AM Andrew Dunstan
 wrote:

We currently only run perlcritic at severity level 5, which is fairly
permissive. I'd like to reduce that, ideally to, say, level 3, which is
what I use for the buildfarm code.

But let's start by going to severity level 4.


I continue to be skeptical of perlcritic. I think it complains about a
lot of things which don't matter very much. We should consider whether
the effort it takes to keep it warning-clean has proportionate
benefits.


Let's see what the patches look like.  At least some of the warnings 
look reasonable, especially in the sense that they are things casual 
Perl programmers might accidentally do wrong.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Rationalize GetWalRcv{Write,Flush}RecPtr().

2020-04-09 Thread Alvaro Herrera

On 2020-Apr-08, Thomas Munro wrote:

> Rationalize GetWalRcv{Write,Flush}RecPtr().
> 
> GetWalRcvWriteRecPtr() previously reported the latest *flushed*
> location.  Adopt the conventional terminology used elsewhere in the tree
> by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
> variables that used the term "received".
> 
> Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
> *written* value.  This will allow later patches to use the value for
> non-data-integrity purposes, without having to wait for the flush
> pointer to advance.

It seems worth pointing out that the new GetWalRcvWriteRecPtr function
has a different signature from the original one -- so any third-party
code using the original function will now get a compile failure that
should alert them that they need to change their code to call
GetWalRcvFlushRecPtr instead.  Maybe we should add a line or two in the
comments GetWalRcvWriteRecPtr to make this explicit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-04-09 Thread Alvaro Herrera

On 2020-Mar-11, Tom Lane wrote:

> BTW, I did not like the syntax too much.  "NO DEPENDS ON EXTENSION"
> doesn't seem like good English.  "NOT DEPENDS ON EXTENSION" is hardly
> any better.  The real problem with both is that an ALTER action should
> be, well, an action.  A grammar stickler would say that it should be
> "ALTER thing DROP DEPENDENCY ON EXTENSION ext", but perhaps we could
> get away with "ALTER thing DROP DEPENDS ON EXTENSION ext" to avoid
> adding a new keyword.  By that logic the original command should have
> been "ALTER thing ADD DEPENDS ON EXTENSION ext", but I suppose it's
> too late for that.

The problem with DROP DEPENDS is alter_table_cmd, which already defines
"DROP opt_column ColId", so there's a reduce/reduce conflict for the
ALTER INDEX and ALTER MATERIALIZED VIEW forms because "depends" could be
a column name.  (It works fine for ALTER FUNCTION/ROUTINE/PROCEDURE/TRIGGER
because there's no command that tries to define a conflicting DROP form
for these.)

It works if I change DEPENDS to be type_func_name_keyword (currently
unreserved_keyword), but I bet we won't like that.

(DEPENDENCY is not a keyword of any kind, so DROP DEPENDENCY require us
making it one of high reservedness, which I suspect we don't like
either).

It would also work to use a different keyword in the DROP position;
maybe REMOVE.  But that's not a keyword currently.

How about ALTER ..  REVOKE DEPENDS or DELETE DEPENDS?  Bison is okay
with either of those forms.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-04-09 Thread Alvaro Herrera

As promised, here's a rebased version of this patch -- edits pending per
discussion to decide the grammar to use.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5df1613901906cff4d0b0b7e480691b65d9d2e5c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 30 Oct 2019 16:57:29 -0300
Subject: [PATCH v2 1/2] Add ALTER .. NO DEPENDS ON

This new command removes any previously added dependency mark; necessary
for completeness.

Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Reviewed-by: ahsan hadi 
Reviewed-by: Ibrar Ahmed 
Reviewed-by: Tom Lane 
---
 doc/src/sgml/ref/alter_function.sgml  | 10 ++--
 doc/src/sgml/ref/alter_index.sgml |  9 ++--
 doc/src/sgml/ref/alter_materialized_view.sgml | 11 ++---
 doc/src/sgml/ref/alter_trigger.sgml   |  7 ++-
 src/backend/catalog/pg_depend.c   | 49 +++
 src/backend/commands/alter.c  | 24 ++---
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/parser/gram.y | 36 +-
 src/include/catalog/dependency.h  |  4 ++
 src/include/nodes/parsenodes.h|  1 +
 .../expected/test_extdepend.out   | 39 ++-
 .../test_extensions/sql/test_extdepend.sql| 18 ++-
 13 files changed, 171 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/ref/alter_function.sgml b/doc/src/sgml/ref/alter_function.sgml
index 03ffa5945a..70b1f24bc0 100644
--- a/doc/src/sgml/ref/alter_function.sgml
+++ b/doc/src/sgml/ref/alter_function.sgml
@@ -30,7 +30,7 @@ ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 SET SCHEMA new_schema
 ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
-DEPENDS ON EXTENSION extension_name
+[ NO ] DEPENDS ON EXTENSION extension_name
 
 where action is one of:
 
@@ -153,10 +153,14 @@ ALTER FUNCTION name [ ( [ [ extension_name
+DEPENDS ON EXTENSION extension_name
+NO DEPENDS ON EXTENSION extension_name
 
  
-  The name of the extension that the function is to depend on.
+  This form marks the function as dependent on the extension, or no longer
+  dependent on that extension if NO is specified.
+  A function that's marked as dependent on an extension is automatically
+  dropped when the extension is dropped.
  
 

diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml
index 6d34dbb74e..de6f89d458 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -100,11 +100,14 @@ ALTER INDEX ALL IN TABLESPACE name

 

-DEPENDS ON EXTENSION
+DEPENDS ON EXTENSION extension_name
+NO DEPENDS ON EXTENSION extension_name
 
  
-  This form marks the index as dependent on the extension, such that if the
-  extension is dropped, the index will automatically be dropped as well.
+  This form marks the index as dependent on the extension, or no longer
+  dependent on that extension if NO is specified.
+  An index that's marked as dependent on an extension is automatically
+  dropped when the extension is dropped.
  
 

diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml
index 03e3df1ffd..9df8a79977 100644
--- a/doc/src/sgml/ref/alter_materialized_view.sgml
+++ b/doc/src/sgml/ref/alter_materialized_view.sgml
@@ -68,12 +68,6 @@ ALTER MATERIALIZED VIEW ALL IN TABLESPACE name
 
-  
-   The DEPENDS ON EXTENSION form marks the materialized view
-   as dependent on an extension, such that the materialized view will
-   automatically be dropped if the extension is dropped.
-  
-
   
The statement subforms and actions available for
ALTER MATERIALIZED VIEW are a subset of those available
@@ -110,7 +104,10 @@ ALTER MATERIALIZED VIEW ALL IN TABLESPACE nameextension_name
  
   
-   The name of the extension that the materialized view is to depend on.
+   The name of the extension that the materialized view is to depend on (or no longer
+   dependent on, if NO is specified).  A materialized view
+   that's marked as dependent on an extension is automatically dropped when
+   the extension is dropped.
   
  
 
diff --git a/doc/src/sgml/ref/alter_trigger.sgml b/doc/src/sgml/ref/alter_trigger.sgml
index 6cf789a67a..6d4784c82f 100644
--- a/doc/src/sgml/ref/alter_trigger.sgml
+++ b/doc/src/sgml/ref/alter_trigger.sgml
@@ -22,7 +22,7 @@ PostgreSQL documentation
  
 
 ALTER TRIGGER name ON table_name RENAME TO new_name
-ALTER TRIGGER name ON table_name DEPENDS ON EXTENSION extension_name
+ALTER TRIGGER name ON table_name [ NO ] DEPENDS ON EXTENSION extension_name
 
  
 
@@ -78,7 +78,10 @@ ALTER TRIGGER name ON extension_name
 
  
-  The name

Re: doc review for v13

2020-04-09 Thread Michael Paquier

On Wed, Apr 08, 2020 at 11:56:53AM -0500, Justin Pryzby wrote:
> I previously mailed separately about a few individual patches, some of which
> have separate, ongoing discussion and aren't included here (incr sort, 
> parallel
> vacuum).

I have gone through your changes, and committed what looked like the
most obvious mistakes in my view.  Please see below for more
comments.

 required pages to remove both downlink and rightlink are already  locked.  That
-evades potential right to left page locking order, which could deadlock with
+avoid potential right to left page locking order, which could deadlock with
Not sure which one is better, but the new change is grammatically
incorrect.

   auto_explain.log_settings controls whether information
-  about modified configuration options are printed when execution plan is 
logged.
-  Only options affecting query planning with value different from the 
built-in
+  about modified configuration options is printed when an execution plan 
is logged.
+  Only those options which affect query planning and whose value differs 
from its built-in
Depends on how you read the sentence, but here is seemt to me that 
"statistics" is the correct subject, no?

-replication is disabled.  Abrupt streaming client disconnection might
+replication is disabled.  Abrupt disconnection of a streaming client 
might
Original looks correct to me here.

 _tz suffix. These functions have been implemented to
-support comparison of date/time values that involves implicit
+support comparison of date/time values that involve implicit
The subject is "comparison" here, no?

 may be included. It also stores the size, last modification time, and
-an optional checksum for each file.
+optionally a checksum for each file.
The original sounds fine to me as well.

Anything related to imath.c needs to be reported upstream, though I
recall reporting these two already.
--
Michael


signature.asc
Description: PGP signature

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Peter Geoghegan

On Thu, Apr 9, 2020 at 1:37 PM James Coleman  wrote:
> We saw the issue on our PG11 clusters. The specific index we noticed
> in the wal dump (I don't think we confirmed if there were others) as
> one on a `created_at` column, to give you an idea of cardinality.

You tend to get a lot of problems with indexes like that when there
are consistent updates (actually, that's more of a thing with an
updated_at index). But non-HOT updates alone might result in what you
could describe as "updates" to the index.

With Postgres 11, a low cardinality index could place new/successor
duplicate index tuples (those needed for non-HOT updates) on a more or
less random leaf page (you'll recall that this is determined by the
old "getting tired" logic). This is the kind of thing I had in mind
when I asked Sawada-san about it.

Was this a low cardinality index in the way I describe? If it was,
then we can hope (and maybe even verify) that the Postgres 12 work
noticeably ameliorates the problem.

-- 
Peter Geoghegan

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-04-09 Thread Andres Freund

Hi,

On 2020-04-09 18:52:32 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
> >> That seems like a fairly magical coding rule that will happen to work
> >> in most practical cases but isn't really a principled approach to the
> >> problem.
> 
> > I'm not sure it'd be that magical to only release resources at
> > CommitTransactionCommand() time. We kinda do that for a few other things
> > already.
> 
> I'd be worried about consumption of resources during a long transaction.
> But maybe we could release at CommandCounterIncrement?

Which resources are you thinking of? SnapshotResetXmin() shouldn't take
any directly. Obviously it can cause bloat - but where would we use a
snapshot for only some part of a command, but need not have xmin
preserved till the end of the command?

> Still, I tend to agree with Robert that associating a snap with an
> open catalog scan is the right way.

I'm wondering if we should do both. I think releasing xmin in the middle
of a command, but only when invalidations arrive in the middle of it, is
pretty likely to be involved in bugs in the future. But it also seems
good to ensure that snapshots are held across relevant operations.

Any idea how to deal with different types of snapshots potentially being
used within such a sequence?

> I have vague memories that a long time ago, all catalog modifications
> were done via the fetch-from-a- scan-and-update approach.  Starting
> from a catcache tuple instead is a relative newbie. If we're going to
> forbid using a catcache tuple as the starting point for updates, one
> way to enforce it would be to have the catcache not save the TID.

I suspect that that'd be fairly painful code-churn wise.

Greetings,

Andres Freund

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Peter Geoghegan

On Thu, Apr 9, 2020 at 6:47 PM James Coleman  wrote:
> I believe the write pattern to this table likely looks like:
> - INSERT
> - UPDATE
> - DELETE
> for every row. But tomorrow I can do some more digging if needed.

The pg_stats.null_frac for the column/index might be interesting here. I
believe that Active Record will sometimes generate created_at columns
that sometimes end up containing NULL values. Not sure why.

--
Peter Geoghegan

Re: [HACKERS] Restricting maximum keep segments by repslots

2020-04-09 Thread Kyotaro Horiguchi

At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> Just avoiding starting replication when restart_lsn is invalid is
me> sufficient (the attached, which is equivalent to a part of what the
me> invalidated flag did). I thing that the error message needs a Hint but
me> it looks on the subscriber side as:

At Wed, 08 Apr 2020 17:02:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> > At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> > The latch triggered by checkpoint request by CHECKPOINT command has
me> > been absorbed by ConditionVariableSleep() in
me> > InvalidateObsoleteReplicationSlots.  The attached allows checkpointer
me> > use MyLatch for other than checkpoint request while a checkpoint is
me> > running.
me> 
me> Checkpoint requests happens during waiting for the CV causes spurious
me> wake up but that doesn't harm.

I added the two above to open items[1] so as not to be forgotten.

[1] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items#Open_Issues

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Thoughts on "killed tuples" index hint bits support on standby

2020-04-09 Thread Michail Nikolaev

Hello, Peter.

> Let me make sure I understand your position:

> You're particularly concerned about cases where there are relatively
> few page splits, and the standby has to wait for VACUUM to run on the
> primary before dead index tuples get cleaned up. The primary itself
> probably has no problem with setting LP_DEAD bits to avoid having
> index scans visiting the heap unnecessarily. Or maybe the queries are
> different on the standby anyway, so it matters to the standby that
> certain index pages get LP_DEAD bits set quickly, though not to the
> primary (at least not for the same pages). Setting the LP_DEAD bits on
> the standby (in about the same way as we can already on the primary)
> is a "night and day" level difference.
> Right?

Yes, exactly.

My initial attempts were too naive (first and second letter) - but you and
Andres gave me some hints on how to make it reliable.

The main goal is to make the standby to be able to use and set LP_DEAD almost
as a primary does. Of course, standby could receive LP_DEAD with FPI from
primary at any moment - so, some kind of cancellation logic is required. Also,
we should keep the frequency of query cancellation at the same level - for that
reason LP_DEAD bits better to be used only by standbys with
hot_standby_feedback enabled. So, I am just repeating myself from the previous
letter here.

> And we're willing to account
> for FPIs on the primary (and the LP_DEAD bits set there) just to be
> able to also set LP_DEAD bits on the standby.

Yes, metaphorically saying - master sending WAL record with the letter:
"Attention, it is possible to receive FPI from me with LP_DEAD set for tuple
with xmax=ABCD, so, if you using LP_DEAD - your xmin should be greater or you
should cancel yourself". And such a letter is required only if this horizon is
moved forward.

And... Looks like it works - queries are mush faster, results look correct,
additional WAL traffic is low, cancellation at the same level... As far as I
can see - the basic concept is correct and effective (but of course, I
could miss something).

The patch is hard to look into - I'll try to split it into several patches
later. And of course, a lot of polishing is required (and there are few places
I am not sure about yet).

Thanks,
Michail.

Properly mark NULL returns in numeric aggregates

2020-04-09 Thread Jesse Zhang

Hi hackers,

We found that several functions -- namely numeric_combine,
numeric_avg_combine, numeric_poly_combine, and int8_avg_combine -- are
returning NULL without signaling the nullity of datum in fcinfo.isnull.
This is obscured by the fact that the only functions in core (finalfunc
for various aggregates) that those return values feed into happen to
tolerate (or rather, not quite distinguish) zero-but-not-NULL trans
values.

In Greenplum, this behavior becomes problematic because Greenplum
serializes internal trans values before spilling the hash table. The
serial functions (numeric_serialize and friends) are strict functions
that will blow up when they are given null (either in the C sense or the
SQL sense) inputs.

In Postgres if we change hash aggregation in the future to spill the
hash table (vis-à-vis the input tuples), this issues would manifest
itself in the final aggregate because we'll serialize the combined (and
likely incorrectly null) trans values.

Please find attached a small patch fixing said issue.  Originally
reported by Denis Smirnov over at
https://github.com/greenplum-db/gpdb/pull/9878

Cheers,
Jesse and Deep
From edeeaa220fbc3d082d133b29a8c394632588d11a Mon Sep 17 00:00:00 2001
From: Denis Smirnov 
Date: Thu, 9 Apr 2020 16:10:29 -0700
Subject: [PATCH] Properly mark NULL returns in numeric aggregates

When a function returns NULL (in the SQL sense), it is expected to
signal that in fcinfo.isnull . numeric_combine and friends break this
rule by directly returning a NULL datum without signaling the nullity.
This is obscured by the fact that the only functions in core (finalfunc
for various aggregates) that those return values feed into happen to
tolerate (or rather, not quite distinguish) zero-but-not-NULL trans
values.

In Greenplum, this behavior becomes problematic because Greenplum
serializes internal trans values before spilling the hash table. The
serial functions (numeric_serialize and friends) are strict functions
that will blow up when they are given null (either in the C sense or the
SQL sense) inputs.
---
 src/backend/utils/adt/numeric.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 9986132b45e..57896de6edf 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -3946,7 +3946,11 @@ numeric_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4034,7 +4038,11 @@ numeric_avg_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4543,7 +4551,11 @@ numeric_poly_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4774,7 +4786,11 @@ int8_avg_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
-- 
2.26.0

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-09 Thread Fujii Masao





On 2020/04/10 3:16, Alexey Kondratov wrote:

On 2020-04-09 16:33, Tom Lane wrote:

Fujii Masao  writes:

On 2020/04/09 16:11, Kyotaro Horiguchi wrote:

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane  wrote in

Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place?



The rationale for not being a fmgr function is stated in the following
comments. [...]



This issue happens because the function is executed after BEGIN? If yes,
what about executing the function (i.e., as separate transaction) before BEGIN?
If so, the snapshot taken in the function doesn't affect the subsequent
transaction whatever its isolation level is.


I wonder whether making it a procedure, rather than a plain function,
would help any.



Just another idea in case if one will still decide to go with a separate 
statement + BEGIN integration instead of a function. We could use parenthesized 
options list here. This is already implemented for VACUUM, REINDEX, etc. There 
was an idea to allow CONCURRENTLY in REINDEX there [1] and recently this was 
proposed again for new options [2], since it is much more extensible from the 
grammar perspective.

That way, the whole feature may look like:

WAIT (LSN '16/B374D848', TIMEOUT 100);

and/or

BEGIN
WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT);
...
COMMIT;

It requires only one reserved keyword 'WAIT'. The advantage of this approach is 
that it can be extended to support xid, timestamp, csn or anything else, that 
may be invented in the future, without affecting the grammar.

What do you think?

Personally, I find this syntax to be more convenient and human-readable 
compared with function call:

SELECT pg_wait_for_lsn('16/B374D848');
BEGIN;


I can imagine that some users want to specify the LSN to wait for,
from the result of another query, for example,
SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case,
isn't the function better?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [Patch] Use internal pthreads reimplementation only when building with MSVC

2020-04-09 Thread Alvaro Herrera

Hello,

On 2020-Apr-08, Sandro Mani wrote:

> The following patch, which we added to build mingw-postgresql on Fedora,
> makes the internal minimal pthreads reimplementation only used when building
> with MSVC, as on MINGW it causes symbol collisions with the symbols provided
> my winpthreads.

Are there any build-system tweaks needed to enable use of winpthreads?
If none are needed, why are all our mingw buildfarm members building
correctly?  I suggest that if you want to maintain "mingw-postgresql
built on Fedora", it would be a good idea to have a buildfarm animal
that tests it on a recurring basis.

Please do submit patches as separate attachments rather than in the
email body.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-04-09 Thread Andres Freund

Hi,

On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
> [ belatedly responding ]
> 
> On Sat, Feb 29, 2020 at 3:17 PM Andres Freund  wrote:
> > My preliminary conclusion is that it's simply not safe to do
> > SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> > PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> > to defer the SnapshotResetXmin() call until at least
> > CommitTransactionCommand()? Outside of that there ought (with exception
> > of multi-transaction commands, but they have to be careful anyway) to be
> > no "in progress" sequences of related catalog lookups/modifications.
> >
> > Alternatively we could ensure that all catalog lookup/mod sequences
> > ensure that the first catalog snapshot is registered. But that seems
> > like a gargantuan task?
> 
> If I understand correctly, the scenario you're concerned about is
> something like this:
> 
> (1) Transaction #1 reads a catalog tuple and immediately releases its 
> snapshot.
> (2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple.
> (3) Transaction #3 completes a VACUUM on the table, so that the old
> tuple is pruned, thus marked dead, and then the TID is marked unused.
> (4) Transaction #4 performs an INSERT which reuses the same TID.
> (5) Transaction #1 now performs a DELETE or UPDATE using the previous
> TID and updates the unrelated tuple which reused the TID rather than
> the intended tuple.

Pretty much.

I think it's enough for 3) and 4) to happen in quite that way. If 3) is
just HOT pruned away, or 3) happens but 4) doesn't, we'd still be in
trouble:
Currently heap_update/delete has no non-assert check that the
passed in TID is an existing tuple.

lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
Assert(ItemIdIsNormal(lp));
..
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
oldtup.t_len = ItemIdGetLength(lp);
oldtup.t_self = *otid;
...
modified_attrs = HeapDetermineModifiedColumns(relation, 
interesting_attrs,

  , newtup);

so we'll treat the page header as a tuple. Not likely to end well.



> It seems to me that what is supposed to prevent this from happening is
> that you aren't supposed to release your snapshot at the end of step
> #1. You're supposed to hold onto it until after step #5 is complete. I
> think that there are fair number of places that are already careful
> about that. I just picked a random source file that I knew Tom had
> written and found this bit in extension_config_remove:
> 
> extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
>  NULL, 1, 
> key);
> 
> extTup = systable_getnext(extScan);
> ...a lot more stuff...
> CatalogTupleUpdate(extRel, >t_self, extTup);
> 
> systable_endscan(extScan);
> 
> Quite apart from this issue, there's a very good reason why it's like
> that: extTup might be pointing right into a disk buffer, and if we did
> systable_endscan() before the last access to it, our pointer could
> become invalid. A fair number of places are protected due to the scan
> being kept open like this, but it looks like most of the ones that use
> SearchSysCacheCopyX + CatalogTupleUpdate are problematic.

Indeed. There's unfortunately quite a few of those.  There's also a few
places, most prominently probably performMultipleDeletions(), that
explicitly do searches, and then afterwards perform deletions - without
holding a snapshot.


> I would be inclined to fix this problem by adjusting those places to
> keep a snapshot open rather than by making some arbitrary rule about
> holding onto a catalog snapshot until the end of the command.

That's what my prototype patch did. It's doable, although we would need
more complete assertions than I had added to ensure we're not
introducing more broken places.

While my patch did that, for correctness I don't think it can just be
something like
snap = RegisterSnapshot(GetLatestSnapshot());
or
PushActiveSnapshot(GetTransactionSnapshot());

as neither willbe the catalog snapshot, which could be older than
GetLatestSnapshot()/GetTransactionSnapshot(). But IIRC we also can't
just register the catalog snapshot, because some parts of the system
will use a "normal" snapshot instead (which could be older).


> That seems like a fairly magical coding rule that will happen to work
> in most practical cases but isn't really a principled approach to the
> problem.

I'm not sure it'd be that magical to only release resources at
CommitTransactionCommand() time. We kinda do that for a few other things
already.


> Besides being magical, it's also fragile: just deciding to
> use a some other snapshot instead of the catalog snapshot causes your
> code to be subtly broken in a way you're surely not going to

Re: where should I stick that backup?

2020-04-09 Thread Bruce Momjian

On Thu, Apr  9, 2020 at 04:15:07PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > I think we need to step back and look at the larger issue.  The real
> > argument goes back to the Unix command-line API vs the VMS/Windows API. 
> > The former has discrete parts that can be stitched together, while the
> > VMS/Windows API presents a more duplicative but more holistic API for
> > every piece.  We have discussed using shell commands for
> > archive_command, and even more recently, for the server pass phrase.  
> 
> When it comes to something like the server pass phrase, it seems much
> more reasonable to consider using a shell script (though still perhaps
> not ideal) because it's not involved directly in ensuring that the data
> is reliably stored and it's pretty clear that if it doesn't work the
> worst thing that happens is that the database doesn't start up, but it
> won't corrupt any data or destroy it or do other bad things.

Well, the pass phrase relates to security, so it is important too.  I
don't think the _importance_ of the action is the most determining
issue.  Rather, I think it is how well the action fits the shell script
API.

> > To get more specific, I think we have to understand how the
> > _requirements_ of the job match the shell script API, with stdin,
> > stdout, stderr, return code, and command-line arguments.  Looking at
> > archive_command, the command-line arguments allow specification of file
> > names, but quoting can be complex.  The error return code and stderr
> > output seem to work fine.  There is no clean API for fsync and testing
> > if the file exists, so that all that has to be hand done in one
> > command-line.  This is why many users use pre-written archive_command
> > shell scripts.
> 
> We aren't considering all of the use-cases really though, in specific,
> things like pushing to s3 or gcs require, at least, good retry logic,
> and that's without starting to think about things like high-rate systems
> (spawning lots of new processes isn't free, particularly if they're
> written in shell script but any interpreted language is expensive) and
> wanting to parallelize.

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

> > This brings up a few questions:
> > 
> > *  Should we have split apart archive_command into file-exists, copy,
> > fsync-file?  Should we add that now?
> 
> No..  The right approach to improving on archive command is to add a way
> for an extension to take over that job, maybe with a complete background
> worker of its own, or perhaps a shared library that can be loaded by the
> archiver process, at least if we're talking about how to allow people to
> extend it.

That seems quite vague, which is the issue we had years ago when
considering doing archive_command as a link to a C library.

> Potentially a better answer is to just build this stuff into PG- things
> like "archive WAL to s3/GCS with these credentials" are what an awful
> lot of users want.  There's then some who want "archive first to this
> other server, and then archive to s3/GCS", or more complex options.

Yes, we certainly know how to do a file system copy, but what about
copying files to other things like S3?  I don't know how we would do
that and allow users to change things like file paths or URLs.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread James Coleman

On Thu, Apr 9, 2020 at 8:32 PM Peter Geoghegan  wrote:
>
> On Thu, Apr 9, 2020 at 5:25 PM Peter Geoghegan  wrote:
> > Was this a low cardinality index in the way I describe? If it was,
> > then we can hope (and maybe even verify) that the Postgres 12 work
> > noticeably ameliorates the problem.
>
> What I really meant was an index where hundreds or even thousands of
> rows for each distinct timestamp value are expected. Not an index
> where almost every row has a distinct timestamp value. Both timestamp
> index patterns are common, obviously.

I'll try to run some numbers tomorrow to confirm, but I believe that
the created_at value is almost (if not completely) unique. So, no,
it's not a low cardinality case like that.

I believe the write pattern to this table likely looks like:
- INSERT
- UPDATE
- DELETE
for every row. But tomorrow I can do some more digging if needed.

James

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Masahiko Sawada

On Fri, 10 Apr 2020 at 04:05, Peter Geoghegan  wrote:
>
> On Wed, Apr 8, 2020 at 10:56 PM Masahiko Sawada
>  wrote:
> > Here is the reproducer:
>
> What version of Postgres did you notice the actual customer issue on?
> I ask because I wonder if the work on B-Tree indexes in Postgres 12
> affects the precise behavior you get here with real world workloads.
> It probably makes _bt_killitems() more effective with some workloads,
> which naturally increases the likelihood of having multiple FPI issued
> in the manner that you describe. OTOH, it might make it less likely
> with low cardinality indexes, since large groups of garbage duplicate
> tuples tend to get concentrated on just a few leaf pages.
>
> > The inner test in the comment "found the item" never tests the item
> > for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> > condition. But there still is a race condition of recording multiple
> > FPIs can happen. Maybe a better solution is to change the lock to
> > exclusive, at least when wal_log_hints = on, so that only one process
> > can run this code -- the reduction in concurrency might be won back by
> > the fact that we don't wal-log the page multiple times.
>
> I like the idea of checking !ItemIdIsDead(iid) as a further condition
> of killing the item -- there is clearly no point in doing work to kill
> an item that is already dead. I don't like the idea of using an
> exclusive buffer lock (even if it's just with wal_log_hints = on),
> though.
>

Okay. I think only adding the check would also help with reducing the
likelihood. How about the changes for the current HEAD I've attached?

Related to this behavior on btree indexes, this can happen even on
heaps during searching heap tuples. To reduce the likelihood of that
more generally I wonder if we can acquire a lock on buffer descriptor
right before XLogSaveBufferForHint() and set a flag to the buffer
descriptor that indicates that we're about to log FPI for hint bit so
that concurrent process can be aware of that.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


btree_fpi.patch
Description: Binary data

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

2020-04-09 Thread Tom Lane

Andres Freund  writes:
> On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
>> That seems like a fairly magical coding rule that will happen to work
>> in most practical cases but isn't really a principled approach to the
>> problem.

> I'm not sure it'd be that magical to only release resources at
> CommitTransactionCommand() time. We kinda do that for a few other things
> already.

I'd be worried about consumption of resources during a long transaction.
But maybe we could release at CommandCounterIncrement?

Still, I tend to agree with Robert that associating a snap with an
open catalog scan is the right way.  I have vague memories that a long
time ago, all catalog modifications were done via the fetch-from-a-
scan-and-update approach.  Starting from a catcache tuple instead
is a relative newbie.

If we're going to forbid using a catcache tuple as the starting point
for updates, one way to enforce it would be to have the catcache
not save the TID.

regards, tom lane

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Peter Geoghegan

On Thu, Apr 9, 2020 at 5:25 PM Peter Geoghegan  wrote:
> Was this a low cardinality index in the way I describe? If it was,
> then we can hope (and maybe even verify) that the Postgres 12 work
> noticeably ameliorates the problem.

What I really meant was an index where hundreds or even thousands of
rows for each distinct timestamp value are expected. Not an index
where almost every row has a distinct timestamp value. Both timestamp
index patterns are common, obviously.

-- 
Peter Geoghegan

Re: [BUG] non archived WAL removed during production crash recovery

2020-04-09 Thread Kyotaro Horiguchi

By the way, I haven't noticed that Cc: didn't contain -hackers.  Added
it.


At Thu, 9 Apr 2020 11:35:12 +0200, Jehan-Guillaume de Rorthais 
 wrote in 
> On Thu, 09 Apr 2020 11:26:57 +0900 (JST)
> Kyotaro Horiguchi  wrote:
> [...]
> > > > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais
> > > >  wrote in   
> > > > > > Ok, so our *current* consensus seems the followings. Right?
> > > > > > 
> > > > > > - If archive_mode=off, any WAL files with .ready files are removed 
> > > > > > in
> > > > > > crash recovery, archive recoery and standby mode.
> > > > > 
> > > > > yes
> > > > 
> > > > If archive_mode = off no WAL files are marked as ".ready".  
> > > 
> > > Sure, on the primary side.
> > > 
> > > What if you build a standby from a backup with archive_mode=on with
> > > some .ready files in there?   
> > 
> > Well. Backup doesn't have nothing in archive_status directory if it is
> > taken by pg_basebackup. If the backup is created other way, it can
> > have some (as Fujii-san mentioned).  Master with archive_mode != off
> > and standby with archive_mode=always should archive WAL files that are
> > not marked .done, but standby with archive_mode == on should not. The
> > commit intended that
> 
> Unless I'm wrong, the commit avoids creating .ready files on standby when a 
> WAL
> has neither .done or .ready status file.

Right.

> > but the mistake here is it thinks that inRecovery represents whether it is
> > running as a standby or not, but actually it is true on primary during crash
> > recovery.
> 
> Indeed.
> 
> > On the other hand, with the patch, standby with archive_mode=on
> > wrongly archives WAL files during crash recovery.
> 
> "without the patch" you mean? You are talking about 78ea8b5daab, right?

No. I menat the v4 patch in [1].

[1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost

Prior to appllying the patch (that is the commit 78ea..),
XLogArchiveCheckDone() correctly returns true (= allow to remove it)
for the same condition.

The proposed patch does the folloing thing.

if (!XLogArchivingActive() ||
recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways())
return true;

It doesn't return for the condition "recoveryState=CRASH_RECOVERING
and archive_mode = on". Then the WAL files is mitakenly marked as
".ready" if not marked yet.

By the way, the code seems not following the convention a bit
here. Let the inserting code be in the same style to the existing code
around.

+   if ( ! XLogArchivingActive() )

I think we don't put the  spaces within the parentheses above.

| ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY

The first two and the last one are in different style. *I* prefer them
(if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY".

> > What we should check there is, as the commit was intended, not whether
> > it is under crash or archive recovery, but whether it is running as
> > primary or standby.
> 
> Yes.
> 
> > > > If it is "always", WAL files that are to be archived are
> > > > marked as ".ready".  Finally, the condition reduces to:
> > > > 
> > > > If archiver is running, archive ".ready" files. Otherwise ignore
> > > > ".ready" and just remove WAL files after use.  
> > > > > 
> > > > > > That is, WAL files with .ready files are removed when either
> > > > > > archive_mode!=always in standby mode or archive_mode=off.
> > > > > 
> > > > > sounds fine to me.
> > > > 
> > > > That situation implies that archive_mode has been changed.  
> > > 
> > > Why? archive_mode may have been "always" on the primary when eg. a 
> > > snapshot
> > > has been created.  
> > 
> > .ready files are created only when archive_mode != off.
> 
> Yes, on a primary, they are created when archive_mode > off. On standby, when
> archive_mode=always. If a primary had archive_mode=always, a standby created
> from its backup will still have archive_mode=always, with no changes.
> Maybe I miss your point, sorry.

Sorry, it was ambiguous.

> That is, WAL files with .ready files are removed when either
> archive_mode!=always in standby mode or archive_mode=off.

If we have .ready file when archive_mode = off, the cluster (or the
original of the copy cluster) should have been running in archive = on
or always. That is, archive_mode has been changed. But anyway that
discussion would not be in much relevance.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: doc review for v13

2020-04-09 Thread Justin Pryzby

On Fri, Apr 10, 2020 at 11:27:46AM +0900, Michael Paquier wrote:
> On Wed, Apr 08, 2020 at 11:56:53AM -0500, Justin Pryzby wrote:
> > I previously mailed separately about a few individual patches, some of which
> > have separate, ongoing discussion and aren't included here (incr sort, 
> > parallel
> > vacuum).
> 
> I have gone through your changes, and committed what looked like the
> most obvious mistakes in my view.  Please see below for more
> comments.

Thanks - rebased for cfbot and continued review.

>  required pages to remove both downlink and rightlink are already  locked.  
> That
> -evades potential right to left page locking order, which could deadlock with
> +avoid potential right to left page locking order, which could deadlock with
> Not sure which one is better, but the new change is grammatically
> incorrect.

Thanks for noticing

"Evades" usually means to act to avoid detection by the government.  Like tax
evasion.

>auto_explain.log_settings controls whether 
> information
> -  about modified configuration options are printed when execution plan 
> is logged.
> -  Only options affecting query planning with value different from the 
> built-in
> +  about modified configuration options is printed when an execution plan 
> is logged.
> +  Only those options which affect query planning and whose value differs 
> from its built-in
> Depends on how you read the sentence, but here is seemt to me that 
> "statistics" is the correct subject, no?

Statistics ?

>  _tz suffix. These functions have been implemented to
> -support comparison of date/time values that involves implicit
> +support comparison of date/time values that involve implicit
> The subject is "comparison" here, no?

You're right.

-- 
Justin
>From e18b7dbdb3b75047540691929a7ef2b55c0b58e2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:51:32 -0500
Subject: [PATCH v2 01/12] doc: psql opclass/opfamily

commit b0b5e20cd8d1a58a8782d5dc806a5232db116e2f
Author: Alexander Korotkov 

ALSO, should we rename the "Purpose" column?  I see we have pg_amop.amoppurpose
so maybe it's fine ?
---
 doc/src/sgml/ref/psql-ref.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 0595d1c04b..cdd24fad98 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1244,7 +1244,7 @@ testdb=
 (see ).
 If access-method-patttern
 is specified, only operator classes associated with access methods whose
-names match pattern are listed.
+names match the pattern are listed.
 If input-type-pattern
 is specified, only operator classes associated with input types whose
 names match the pattern are listed.
@@ -1267,7 +1267,7 @@ testdb=
 (see ).
 If access-method-patttern
 is specified, only operator families associated with access methods whose
-names match pattern are listed.
+names match the pattern are listed.
 If input-type-pattern
 is specified, only operator families associated with input types whose
 names match the pattern are listed.
@@ -1291,7 +1291,7 @@ testdb=
 ().
 If access-method-patttern
 is specified, only members of operator families associated with access
-methods whose names match pattern are listed.
+methods whose names match the pattern are listed.
 If input-type-pattern
 is specified, only members of operator families whose names match the
 pattern are listed.
@@ -1314,7 +1314,7 @@ testdb=
 ().
 If access-method-patttern
 is specified, only members of operator families associated with access
-methods whose names match pattern are listed.
+methods whose names match the pattern are listed.
 If input-type-pattern
 is specified, only members of operator families whose names match the
 pattern are listed.
-- 
2.17.0

>From 650c29ca5ce500337bd0ff7345f9cfcb60dd596a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 21:31:08 -0500
Subject: [PATCH v2 02/12] doc: percent encoding

commit e0ed6817c0ee218a3681920807404603e042ff04
Author: Michael Paquier 
---
 doc/src/sgml/libpq.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 75d2224a61..8b9af95c14 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -925,11 +925,11 @@ postgresql:///mydb?host=localhostport=5433

 

-Connection URI needs to be encoded with 
+A connection URI needs to be encoded with 
 https://tools.ietf.org/html/rfc3986#section-2.1;>Percent-encoding 
-if it includes symbols with special meaning in any of its parts. 
-Here is an example where equal sign (=) is replaced
-with %3D and whitespace character with

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 2:03 PM Justin Pryzby  wrote:
>
> On Thu, Apr 09, 2020 at 05:07:48PM +0900, Masahiko Sawada wrote:
> > Yes but the difference is that we cannot disable PARSER or COPY by
> > specifying options whereas we can do something like "VACUUM (FULL
> > false) tbl" to disable FULL option. I might be misunderstanding the
> > meaning of "specify" though.
>
> You have it right.
>
> We should fix the behavior, but change the error message for consistency with
> that change, like so.
>

Okay, but I think the error message suggested by Robert "ERROR: VACUUM
FULL cannot be performed in parallel" sounds better than what you have
proposed.  What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: SyncRepLock acquired exclusively in default configuration

2020-04-09 Thread Fujii Masao

On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund mailto:and...@anarazel.de>> wrote:

 > How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load?  Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above 
SyncRepUpdateSyncStandbysDefined(), it seems fine.

 > Bring back the check which existed based on GUC but instead of just 
blindly
 > returning based on just GUC not being set, check
 > WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.

/*
-* Fast exit if user has not requested sync replication.
+* Fast exit if user has not requested sync replication, or there are no
+* sync replication standby names defined.
 */
-   if (!SyncRepRequested())
+   if (!SyncRepRequested() ||
+   !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/replication/syncrep.c 
b/src/backend/replication/syncrep.c
index ffd5b31eb2..c6df3f4da6 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -163,9 +163,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);

/*
-* Fast exit if user has not requested sync replication.
+* Fast exit if user has not requested sync replication, or there are no
+* sync replication standby names defined.
 */
-   if (!SyncRepRequested())
+   if (!SyncRepRequested() ||
+   !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 7:31 PM Robert Haas  wrote:
>
> On Thu, Apr 9, 2020 at 1:36 AM Amit Kapila  wrote:
> > On Thu, Apr 9, 2020 at 7:07 AM Michael Paquier  wrote:
> > > On Wed, Apr 08, 2020 at 01:38:54PM -0500, Justin Pryzby wrote:
> > > > I think the behavior is correct, but the error message could be 
> > > > improved, like:
> > > > |ERROR:  cannot specify FULL with PARALLEL jobs
> > > > or similar.
> > >
> > > Perhaps "cannot use FULL and PARALLEL options together"?
> > >
> >
> > We already have a similar message "cannot specify both PARSER and COPY
> > options", so I think the current message is fine.
>
> So, whether the error message is OK depends on the details. The
> situation as I understand it is that a vacuum cannot be both parallel
> and full. If you disallow every command that includes both key words,
> then the message seems fine. But suppose you allow
>
> VACUUM (PARALLEL 1, FULL 0) foo;
>
> There's no technical problem here, because the vacuum is not both
> parallel and full. But if you allow that case, then there is an error
> message problem, perhaps, because your error message says that you
> cannot specify both options, but here you did specify both options,
> and yet it worked. So really if this case is allowed a more accurate
> error message would be something like:
>
> ERROR: VACUUM FULL cannot be performed in parallel
>
> But if you used this latter error message yet disallowed VACUUM
> (PARALLEL 1, FULL 0) then it again wouldn't make sense, because the
> error message would be now forbidding something that you never tried
> to do.
>
> The point is that we need to decide whether we're going to complain
> whenever both options are specified in the syntax, or whether we're
> going to complain when they're combined in a way that we don't
> support.
>

I would prefer later as I don't find it a good idea to unnecessarily
block some syntax.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Multiple FPI_FOR_HINT for the same block during killing btree index items

2020-04-09 Thread Kyotaro Horiguchi

At Fri, 10 Apr 2020 12:32:31 +0900, Masahiko Sawada 
 wrote in 
> On Fri, 10 Apr 2020 at 04:05, Peter Geoghegan  wrote:
> >
> > On Wed, Apr 8, 2020 at 10:56 PM Masahiko Sawada
> >  wrote:
> > > Here is the reproducer:
> >
> > What version of Postgres did you notice the actual customer issue on?
> > I ask because I wonder if the work on B-Tree indexes in Postgres 12
> > affects the precise behavior you get here with real world workloads.
> > It probably makes _bt_killitems() more effective with some workloads,
> > which naturally increases the likelihood of having multiple FPI issued
> > in the manner that you describe. OTOH, it might make it less likely
> > with low cardinality indexes, since large groups of garbage duplicate
> > tuples tend to get concentrated on just a few leaf pages.
> >
> > > The inner test in the comment "found the item" never tests the item
> > > for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> > > condition. But there still is a race condition of recording multiple
> > > FPIs can happen. Maybe a better solution is to change the lock to
> > > exclusive, at least when wal_log_hints = on, so that only one process
> > > can run this code -- the reduction in concurrency might be won back by
> > > the fact that we don't wal-log the page multiple times.
> >
> > I like the idea of checking !ItemIdIsDead(iid) as a further condition
> > of killing the item -- there is clearly no point in doing work to kill
> > an item that is already dead. I don't like the idea of using an
> > exclusive buffer lock (even if it's just with wal_log_hints = on),
> > though.
> >
> 
> Okay. I think only adding the check would also help with reducing the
> likelihood. How about the changes for the current HEAD I've attached?

FWIW, looks good to me.

> Related to this behavior on btree indexes, this can happen even on
> heaps during searching heap tuples. To reduce the likelihood of that
> more generally I wonder if we can acquire a lock on buffer descriptor
> right before XLogSaveBufferForHint() and set a flag to the buffer
> descriptor that indicates that we're about to log FPI for hint bit so
> that concurrent process can be aware of that.

Makes sense if the lock were acquired just before the "BM_DIRTY |
BM_JUST_DIRTIED) check.  Could we use double-checking, as similar to
the patch for ItemIdIsDead()?

> if ((pg_atomic_read_u32(>state) & (BM_DIRTY | BM_JUST_DIRTIED)) !=
> (BM_DIRTY | BM_JUST_DIRTIED))
> {
...
>   * essential that CreateCheckpoint waits for virtual transactions
>   * rather than full transactionids.
>   */
>  /* blah, blah */   
>  buf_state = LockBufHdr(bufHdr);
>
>  if (buf_state & (BM_ | BM_JUST) != (..))
>  {
>MyProc->delayChkpt = delayChkpt = true;
>lsn = XLogSaveBufferForHint(buffer, buffer_std);
>  }
>}
>else
> buf_state = LockBuffer(bufHdr);
  

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: SyncRepLock acquired exclusively in default configuration

2020-04-09 Thread Masahiko Sawada

On Fri, 10 Apr 2020 at 13:20, Fujii Masao  wrote:
>
>
>
> On 2020/04/08 3:01, Ashwin Agrawal wrote:
> >
> > On Mon, Apr 6, 2020 at 2:14 PM Andres Freund  > > wrote:
> >
> >  > How about we change it to this ?
> >
> > Hm. Better. But I think it might need at least a compiler barrier /
> > volatile memory load?  Unlikely here, but otherwise the compiler could
> > theoretically just stash the variable somewhere locally (it's not likely
> > to be a problem because it'd not be long ago that we acquired an lwlock,
> > which is a full barrier).
> >
> >
> > That's the part, I am not fully sure about. But reading the comment above 
> > SyncRepUpdateSyncStandbysDefined(), it seems fine.
> >
> >  > Bring back the check which existed based on GUC but instead of just 
> > blindly
> >  > returning based on just GUC not being set, check
> >  > WalSndCtl->sync_standbys_defined. Thoughts?
> >
> > Hm. Is there any reason not to just check
> > WalSndCtl->sync_standbys_defined? rather than both 
> > !SyncStandbysDefined()
> > and WalSndCtl->sync_standbys_defined?
> >
> >
> > Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.
>
> So the consensus is something like the following? Patch attached.
>
>  /*
> -* Fast exit if user has not requested sync replication.
> +* Fast exit if user has not requested sync replication, or there are 
> no
> +* sync replication standby names defined.
>   */
> -   if (!SyncRepRequested())
> +   if (!SyncRepRequested() ||
> +   !((volatile WalSndCtlData *) 
> WalSndCtl)->sync_standbys_defined)
>  return;
>

I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing. But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on, it's okay to return quickly since all backend consistently
behaves so.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Masahiko Sawada

On Fri, 10 Apr 2020 at 14:04, Amit Kapila  wrote:
>
> On Thu, Apr 9, 2020 at 2:03 PM Justin Pryzby  wrote:
> >
> > On Thu, Apr 09, 2020 at 05:07:48PM +0900, Masahiko Sawada wrote:
> > > Yes but the difference is that we cannot disable PARSER or COPY by
> > > specifying options whereas we can do something like "VACUUM (FULL
> > > false) tbl" to disable FULL option. I might be misunderstanding the
> > > meaning of "specify" though.
> >
> > You have it right.
> >
> > We should fix the behavior, but change the error message for consistency 
> > with
> > that change, like so.
> >
>
> Okay, but I think the error message suggested by Robert "ERROR: VACUUM
> FULL cannot be performed in parallel" sounds better than what you have
> proposed.  What do you think?

I totally agree.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: segmentation fault using currtid and partitioned tables

2020-04-09 Thread Michael Paquier

On Wed, Apr 08, 2020 at 04:13:31PM +0900, Michael Paquier wrote:
> I have been looking at the tree and the use of the table AM APIs, and
> those TID lookups are really a particular case compared to the other
> callers of the table AM callbacks.  Anyway, I have not spotted similar
> problems, so I find very tempting the option to just add some
> RELKIND_HAS_STORAGE() to tid.c where it matters and call it a day.

Playing more with this stuff, it happens that we have zero code
coverage for currtid() and currtid2(), and the main user of those
functions I can find around is the ODBC driver:
https://coverage.postgresql.org/src/backend/utils/adt/tid.c.gcov.html

There are multiple cases to consider, particularly for views:
- Case of a view with ctid as attribute taken from table.
- Case of a view with ctid as attribute with incorrect attribute
type.
It is worth noting that all those code paths can trigger various
elog() errors, which is not something that a user should be able to do
using a SQL-callable function.  There are also two code paths for
cases where a view has no or more-than-one SELECT rules, which cannot
normally be reached.

All in that, I propose something like the attached to patch the
surroundings with tests to cover everything I could think of, which I
guess had better be backpatched?  While on it, I have noticed that we
lack coverage for max(tid) and min(tid), so I have included a bonus
test.

Another issue is that we don't have any documentation for those
functions, in which case the best fit is a subsection for TID
operators under "Functions and Operators"?

For now, I am adding a patch to next CF so as we don't forget about
this set of issues.  Any thoughts?
--
Michael
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 4ce8375eab..90e03e890d 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -31,6 +31,7 @@
 #include "parser/parsetree.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/varlena.h"
@@ -300,20 +301,41 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 	for (i = 0; i < natts; i++)
 	{
 		Form_pg_attribute attr = TupleDescAttr(att, i);
+		char	   *attname = NameStr(attr->attname);
 
-		if (strcmp(NameStr(attr->attname), "ctid") == 0)
+		if (strcmp(attname, "ctid") == 0)
 		{
 			if (attr->atttypid != TIDOID)
-elog(ERROR, "ctid isn't of type TID");
+ereport(ERROR,
+		(errcode(ERRCODE_DATATYPE_MISMATCH),
+		 errmsg("cannot use attribute \"%s\" of type %s on view \"%s.%s\"",
+attname, format_type_be(attr->atttypid),
+get_namespace_name(RelationGetNamespace(viewrel)),
+RelationGetRelationName(viewrel;
 			tididx = i;
 			break;
 		}
 	}
+
 	if (tididx < 0)
-		elog(ERROR, "currtid cannot handle views with no CTID");
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot look at latest visible tid for (%u, %u) on view \"%s.%s\" without CTID",
+		ItemPointerGetBlockNumberNoCheck(tid),
+		ItemPointerGetOffsetNumberNoCheck(tid),
+		get_namespace_name(RelationGetNamespace(viewrel)),
+		RelationGetRelationName(viewrel;
+
 	rulelock = viewrel->rd_rules;
+
 	if (!rulelock)
-		elog(ERROR, "the view has no rules");
+		elog(ERROR,
+			 "cannot look at latest visible tid for (%u, %u) on view \"%s.%s\" with no rules",
+			 ItemPointerGetBlockNumberNoCheck(tid),
+			 ItemPointerGetOffsetNumberNoCheck(tid),
+			 get_namespace_name(RelationGetNamespace(viewrel)),
+			 RelationGetRelationName(viewrel));
+
 	for (i = 0; i < rulelock->numLocks; i++)
 	{
 		rewrite = rulelock->rules[i];
@@ -323,7 +345,13 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 			TargetEntry *tle;
 
 			if (list_length(rewrite->actions) != 1)
-elog(ERROR, "only one select rule is allowed in views");
+elog(ERROR,
+	 "cannot look at latest visible tid for (%u, %u) on view \"%s.%s\" with multiple SELECT rules",
+	 ItemPointerGetBlockNumberNoCheck(tid),
+	 ItemPointerGetOffsetNumberNoCheck(tid),
+	 get_namespace_name(RelationGetNamespace(viewrel)),
+	 RelationGetRelationName(viewrel));
+
 			query = (Query *) linitial(rewrite->actions);
 			tle = get_tle_by_resno(query->targetList, tididx + 1);
 			if (tle && tle->expr && IsA(tle->expr, Var))
@@ -345,10 +373,21 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 			break;
 		}
 	}
-	elog(ERROR, "currtid cannot handle this view");
+
+	elog(ERROR, "could not look at latest visible tid for (%u, %u) on view \"%s.%s\"",
+		 ItemPointerGetBlockNumberNoCheck(tid),
+		 ItemPointerGetOffsetNumberNoCheck(tid),
+		 get_namespace_name(RelationGetNamespace(viewrel)),
+		 RelationGetRelationName(viewrel));
 	return (Datum) 0;
 }
 
+/*
+ *	currtid_byreloid
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid(), and
+ *		uses in

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Masahiko Sawada

On Thu, 9 Apr 2020 at 14:52, Amit Kapila  wrote:
>
> On Thu, Apr 9, 2020 at 12:09 AM Justin Pryzby  wrote:
> >
> > On Thu, Apr 09, 2020 at 12:06:04AM +0530, Mahendra Singh Thalor wrote:
> > > On Wed, 8 Apr 2020 at 22:11, Justin Pryzby  wrote:
> > > >
> > >
> > > Thanks Justin for the patch.
> > >
> > > Patch looks fine to me and it is fixing the issue. After applying this
> > > patch, vacuum will work as:
> > >
> > > 1) vacuum (parallel 1, full 0);
> > > -- vacuuming will be done with 1 parallel worker.
> > > 2) vacuum (parallel 0, full 1);
> > > -- full vacuuming will be done.
> > > 3) vacuum (parallel 1, full 1);
> > > -- will give error :ERROR:  cannot specify both FULL and PARALLEL options
> > >
> > > 3rd example is telling that we can't give both FULL and PARALLEL
> > > options but in 1st and 2nd, we are giving both FULL and PARALLEL
> > > options and we are not giving any error. I think, irrespective of
> > > value of both FULL and PARALLEL options, we should give error in all
> > > the above mentioned three cases.
> >
> > I think the behavior is correct, but the error message could be improved,
> >
>
> Yeah, I also think that the behavior is fine.

Me too.

> We can do what Mahendra
> is saying but that will unnecessarily block some syntax and we might
> need to introduce an extra variable to detect that in code.

ISTM we have been using the expression "specifying the option" in log
messages when a user wrote the option in the command. But now that
VACUUM command options can have a true/false it doesn't make sense to
say like "if the option is specified we cannot do that". So maybe
"cannot turn on FULL and PARALLEL options" or something would be
better, I think.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 12:14 PM Michael Paquier  wrote:
>
> On Thu, Apr 09, 2020 at 11:05:50AM +0530, Amit Kapila wrote:
> > On Thu, Apr 9, 2020 at 7:07 AM Michael Paquier  wrote:
> >>  I think that
> >> this patch needs tests in sql/vacuum.sql.
> >
> > We already have one test that is testing an invalid combination of
> > PARALLEL and FULL option, not sure of adding more on similar lines is
> > a good idea, but we can do that if it makes sense.  What more tests
> > you have in mind which make sense here?
>
> As you say, vacuum.sql includes this test:
> VACUUM (PARALLEL 2, FULL TRUE) pvactst; -- error, cannot use both PARALLEL 
> and FULL
> ERROR:  cannot specify both FULL and PARALLEL options
>
> But based on the discussion of this thread, it seems to me that we had
> better stress more option combinations, particularly the two following
> ones:
> vacuum (full 0, parallel 1) foo;
> vacuum (full 1, parallel 0) foo;
>
> Without that, how do you make sure that the compatibility wanted does
> not break again in the future?  As of HEAD, the first one passes and
> the second one fails.  And as Tushar is telling us we want to
> handle both cases in a consistent way.
>

We can add more tests to validate the syntax, but my worry was to not
increase test timing by doing (parallel) vacuum.  So maybe we can do
such syntax validation on empty tables or you have any better idea?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-09 Thread Kyotaro Horiguchi

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane  wrote in 
> Anna Akenteva  writes:
> > I'd like to hear others' opinions on the syntax as well.
> 
> Pardon me for coming very late to the party, but it seems like there are
> other questions that ought to be answered before we worry about any of
> this.  Why is this getting grafted onto BEGIN/START TRANSACTION in the
> first place?  It seems like it would be just as useful as a separate
> command, if not more so.  You could always start a transaction just
> after waiting.  Conversely, there might be reasons to want to wait
> within an already-started transaction.
> 
> If it could survive as a separate command, then I'd humbly suggest
> that it requires no grammar work at all.  You could just invent one
> or more functions that take suitable parameters.

The rationale for not being a fmgr function is stated in the following
comments.

https://www.postgresql.org/message-id/CAEepm%3D0V74EApmfv%3DMGZa24Ac_pV1vGrp3Ovnv-3rUXwxu9epg%40mail.gmail.com
| because it doesn't work for our 2 higher isolation levels as
| mentioned."

https://www.postgresql.org/message-id/CA%2BTgmob-aG3Lqh6OpvMDYTNR5eyq94VugyEejyk7pLhE9uwnyA%40mail.gmail.com

| IMHO, trying to do this using a function-based interface is a really
| bad idea for exactly the reasons you mention.  I don't see why we'd
| resist the idea of core syntax here; transactions are a core part of
| PostgreSQL.

It seemed to me that they were suggested it to be in a part of BEGIN
command, but the next proposed patch implemented "WAIT FOR" command
for uncertain reasons to me.  I don't object to the isolate command if
it is useful than a part of BEGIN command.

By the way, for example, pg_current_wal_lsn() is a volatile function
and repeated calls within a SERIALIZABLE transaction can return
different values.

If there's no necessity for this feature to be a core command, I think
I would like to be it a function.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Commitfest 2020-03 Now in Progress

2020-04-09 Thread Julien Rouhaud

Le jeu. 9 avr. 2020 à 04:12, Alvaro Herrera  a
écrit :

> On 2020-Apr-08, David Steele wrote:
>
> > The 2020-03 Commitfest is officially closed.
> >
> > Final stats are (for entire CF, not just from March 1 this time):
>
> Thanks for managing!
>

Thanks a lot for the hard work!

>

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-09 Thread Etsuro Fujita

On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> Etsuro Fujita  writes:
> > Yeah, partition_bounds_merge() is currently called only from
> > try_partitionwise_join(), which guarantees that the strategies are the
> > same.

> If there's only one caller and there's not likely to ever be more,
> then I tend to agree that you don't need the assertion.

It seems unlikely that partition_bounds_merge() will be called from
more places in the foreseeable future, so I'd still vote for removing
the assertion.

Best regards,
Etsuro Fujita

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Michael Paquier

On Thu, Apr 09, 2020 at 11:05:50AM +0530, Amit Kapila wrote:
> On Thu, Apr 9, 2020 at 7:07 AM Michael Paquier  wrote:
>>  I think that
>> this patch needs tests in sql/vacuum.sql.
> 
> We already have one test that is testing an invalid combination of
> PARALLEL and FULL option, not sure of adding more on similar lines is
> a good idea, but we can do that if it makes sense.  What more tests
> you have in mind which make sense here?

As you say, vacuum.sql includes this test:
VACUUM (PARALLEL 2, FULL TRUE) pvactst; -- error, cannot use both PARALLEL and 
FULL
ERROR:  cannot specify both FULL and PARALLEL options

But based on the discussion of this thread, it seems to me that we had
better stress more option combinations, particularly the two following
ones:
vacuum (full 0, parallel 1) foo;
vacuum (full 1, parallel 0) foo;

Without that, how do you make sure that the compatibility wanted does
not break again in the future?  As of HEAD, the first one passes and
the second one fails.  And as Tushar is telling us we want to 
handle both cases in a consistent way.
--
Michael


signature.asc
Description: PGP signature

Re: adding partitioned tables to publications

2020-04-09 Thread Amit Langote

On Thu, Apr 9, 2020 at 4:14 PM Peter Eisentraut
 wrote:
> On 2020-04-09 05:39, Amit Langote wrote:
> > sub_viaroot ERROR:  number of columns (2601) exceeds limit (1664)
> > sub_viaroot CONTEXT:  slot "sub_viaroot", output plugin "pgoutput", in
> > the change callback, associated LSN 0/1621010
>
> I think the problem is that in maybe_send_schema(),
> RelationClose(ancestor) releases the relcache entry, but the tuple
> descriptors, which are part of the relcache entry, are still pointed to
> by the tuple map.
>
> This patch makes the tests pass for me:
>
> diff --git a/src/backend/replication/pgoutput/pgoutput.c
> b/src/backend/replication/pgoutput/pgoutput.c
> index 5fbf2d4367..cf6e8629c1 100644
> --- a/src/backend/replication/pgoutput/pgoutput.c
> +++ b/src/backend/replication/pgoutput/pgoutput.c
> @@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
>
>  /* Map must live as long as the session does. */
>  oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> -   relentry->map = convert_tuples_by_name(indesc, outdesc);
> +   relentry->map =
> convert_tuples_by_name(CreateTupleDescCopy(indesc),
> CreateTupleDescCopy(outdesc));
>  MemoryContextSwitchTo(oldctx);
>  send_relation_and_attrs(ancestor, ctx);
>  RelationClose(ancestor);
>
> Please check.

Thanks.  Yes, that's what I just found out too and was about to send a
patch, which is basically same as yours as far as the fix for this
issue is concerned.

While figuring this out, I thought the nearby code could be rearranged
a bit, especially to de-duplicate the code.  Also, I think
get_rel_sync_entry() may be a better place to set the map, rather than
maybe_send_schema().  Thoughts?

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com


logicalrep-partition-code-fixes.patch
Description: Binary data

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 11:54 AM Masahiko Sawada
 wrote:
>
> On Thu, 9 Apr 2020 at 14:52, Amit Kapila  wrote:
> >
>
> > We can do what Mahendra
> > is saying but that will unnecessarily block some syntax and we might
> > need to introduce an extra variable to detect that in code.
>
> ISTM we have been using the expression "specifying the option" in log
> messages when a user wrote the option in the command. But now that
> VACUUM command options can have a true/false it doesn't make sense to
> say like "if the option is specified we cannot do that". So maybe
> "cannot turn on FULL and PARALLEL options" or something would be
> better, I think.
>

Sure, we can change that, but isn't the existing example of similar
message "cannot specify both PARSER and COPY options"  occurs when
both the options have valid values?  If so, we can use a similar
principle here, no?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: adding partitioned tables to publications

2020-04-09 Thread Peter Eisentraut


On 2020-04-09 05:39, Amit Langote wrote:

sub_viaroot ERROR:  number of columns (2601) exceeds limit (1664)
sub_viaroot CONTEXT:  slot "sub_viaroot", output plugin "pgoutput", in
the change callback, associated LSN 0/1621010


I think the problem is that in maybe_send_schema(), 
RelationClose(ancestor) releases the relcache entry, but the tuple 
descriptors, which are part of the relcache entry, are still pointed to 
by the tuple map.


This patch makes the tests pass for me:

diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c

index 5fbf2d4367..cf6e8629c1 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-   relentry->map = convert_tuples_by_name(indesc, outdesc);
+   relentry->map = 
convert_tuples_by_name(CreateTupleDescCopy(indesc), 
CreateTupleDescCopy(outdesc));

MemoryContextSwitchTo(oldctx);
send_relation_and_attrs(ancestor, ctx);
RelationClose(ancestor);

Please check.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Michael Paquier

On Thu, Apr 09, 2020 at 12:33:57PM +0530, Amit Kapila wrote:
> We can add more tests to validate the syntax, but my worry was to not
> increase test timing by doing (parallel) vacuum.  So maybe we can do
> such syntax validation on empty tables or you have any better idea?

Using empty tables for positive tests is the least expensive option.
--
Michael


signature.asc
Description: PGP signature

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-09 Thread davinder singh

On Wed, Apr 8, 2020 at 7:39 PM Juan José Santamaría Flecha

> Let me explain further, in pg_config_os.h you can check that the value of
> _WIN32_WINNT is solely based on checking _MSC_VER. This patch should also
> be meaningful for WIN32 builds using MinGW, or we might see this issue
> reappear in those  systems if update the  MIN_WINNT value to more current
> OS versions. So, I still think  _WIN32_WINNT is a better option.
>
Thanks for explanation, I was not aware of that, you are right it make
sense to use " _WIN32_WINNT", Now I am using this only.

I still see the same last lines in both #ifdef blocks, and pgindent might
> change a couple of lines to:
> + MultiByteToWideChar(CP_ACP, 0, winlocname, -1, wc_locale_name,
> + LOCALE_NAME_MAX_LENGTH);
> +
> + if ((GetLocaleInfoEx(wc_locale_name, LOCALE_SNAME,
> + (LPWSTR), LOCALE_NAME_MAX_LENGTH)) > 0)
> + {
>
Now I have resolved these comments also, Please check updated version of
the patch.


> Please open an item in the commitfest for this patch.
>
I have created with same title.


-- 
Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com


0001-PG-compilation-error-with-VS-2015-2017-2019.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-09 Thread Fujii Masao





On 2020/04/09 16:11, Kyotaro Horiguchi wrote:

At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane  wrote in

Anna Akenteva  writes:

I'd like to hear others' opinions on the syntax as well.


Pardon me for coming very late to the party, but it seems like there are
other questions that ought to be answered before we worry about any of
this.  Why is this getting grafted onto BEGIN/START TRANSACTION in the
first place?  It seems like it would be just as useful as a separate
command, if not more so.  You could always start a transaction just
after waiting.  Conversely, there might be reasons to want to wait
within an already-started transaction.

If it could survive as a separate command, then I'd humbly suggest
that it requires no grammar work at all.  You could just invent one
or more functions that take suitable parameters.


The rationale for not being a fmgr function is stated in the following
comments.


This issue happens because the function is executed after BEGIN? If yes,
what about executing the function (i.e., as separate transaction) before BEGIN?
If so, the snapshot taken in the function doesn't affect the subsequent
transaction whatever its isolation level is.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Justin Pryzby

On Thu, Apr 09, 2020 at 05:07:48PM +0900, Masahiko Sawada wrote:
> Yes but the difference is that we cannot disable PARSER or COPY by
> specifying options whereas we can do something like "VACUUM (FULL
> false) tbl" to disable FULL option. I might be misunderstanding the
> meaning of "specify" though.

You have it right.

We should fix the behavior, but change the error message for consistency with
that change, like so.

-- 
Justin
>From 206d4f2e05360e9d02bcf3b941f9f8ea99f5f8a9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 9 Apr 2020 03:29:13 -0500
Subject: [PATCH v2] Allow specifying "(parallel 0)" with "vacuum(full)"..

..even though full implies parallel 0

Discussion:
https://www.postgresql.org/message-id/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com
---
 src/backend/commands/vacuum.c| 2 +-
 src/test/regress/expected/vacuum.out | 5 +++--
 src/test/regress/sql/vacuum.sql  | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 660c854d49..645162d4bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -200,7 +200,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	if ((params.options & VACOPT_FULL) && params.nworkers > 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot specify both FULL and PARALLEL options")));
+ errmsg("cannot specify both FULL and parallel workers")));
 
 	/*
 	 * Make sure VACOPT_ANALYZE is specified if any column lists are present.
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 0cfe28e63f..bc2c3db8d8 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -115,14 +115,15 @@ LINE 1: VACUUM (PARALLEL -1) pvactst;
 ^
 VACUUM (PARALLEL 2, INDEX_CLEANUP FALSE) pvactst;
 VACUUM (PARALLEL 2, FULL TRUE) pvactst; -- error, cannot use both PARALLEL and FULL
-ERROR:  cannot specify both FULL and PARALLEL options
+ERROR:  cannot specify both FULL and parallel workers
+VACUUM (PARALLEL 0, FULL TRUE) pvactst; -- can specify parallel disable (even though that's implied by FULL)
 VACUUM (PARALLEL) pvactst; -- error, cannot use PARALLEL option without parallel degree
 ERROR:  parallel option requires a value between 0 and 1024
 LINE 1: VACUUM (PARALLEL) pvactst;
 ^
 CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
 CREATE INDEX tmp_idx1 ON tmp (a);
-VACUUM (PARALLEL 1) tmp; -- disables parallel vacuum option
+VACUUM (PARALLEL 1) tmp; -- parallel vacuum disabled for temp tables
 WARNING:  disabling parallel option of vacuum on "tmp" --- cannot vacuum temporary tables in parallel
 RESET min_parallel_index_scan_size;
 DROP TABLE pvactst;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index cf741f7b11..9743db35c7 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -99,10 +99,11 @@ VACUUM (PARALLEL 0) pvactst; -- disable parallel vacuum
 VACUUM (PARALLEL -1) pvactst; -- error
 VACUUM (PARALLEL 2, INDEX_CLEANUP FALSE) pvactst;
 VACUUM (PARALLEL 2, FULL TRUE) pvactst; -- error, cannot use both PARALLEL and FULL
+VACUUM (PARALLEL 0, FULL TRUE) pvactst; -- can specify parallel disable (even though that's implied by FULL)
 VACUUM (PARALLEL) pvactst; -- error, cannot use PARALLEL option without parallel degree
 CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
 CREATE INDEX tmp_idx1 ON tmp (a);
-VACUUM (PARALLEL 1) tmp; -- disables parallel vacuum option
+VACUUM (PARALLEL 1) tmp; -- parallel vacuum disabled for temp tables
 RESET min_parallel_index_scan_size;
 DROP TABLE pvactst;
 
-- 
2.17.0

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Justin Pryzby

On Thu, Apr 09, 2020 at 12:31:55PM +0530, Amit Kapila wrote:
> Sure, we can change that, but isn't the existing example of similar
> message "cannot specify both PARSER and COPY options"  occurs when
> both the options have valid values?  If so, we can use a similar
> principle here, no?

A better comparison is with this one:

src/bin/pg_dump/pg_restore.c:   pg_log_error("cannot specify both 
--single-transaction and multiple jobs");

but it doesn't say just: "..specify both --single and --jobs", which would be
wrong in the same way, and which we already dealt with some time ago:

commit 14a4f6f3748df4ff63bb2d2d01146b2b98df20ef
Author: Alvaro Herrera 
Date:   Tue Apr 14 00:06:35 2009 +

pg_restore -jN does not equate "multiple jobs", so partly revert the
previous patch.

Per note from Tom.

-- 
Justin

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Masahiko Sawada

On Thu, 9 Apr 2020 at 16:02, Amit Kapila  wrote:
>
> On Thu, Apr 9, 2020 at 11:54 AM Masahiko Sawada
>  wrote:
> >
> > On Thu, 9 Apr 2020 at 14:52, Amit Kapila  wrote:
> > >
> >
> > > We can do what Mahendra
> > > is saying but that will unnecessarily block some syntax and we might
> > > need to introduce an extra variable to detect that in code.
> >
> > ISTM we have been using the expression "specifying the option" in log
> > messages when a user wrote the option in the command. But now that
> > VACUUM command options can have a true/false it doesn't make sense to
> > say like "if the option is specified we cannot do that". So maybe
> > "cannot turn on FULL and PARALLEL options" or something would be
> > better, I think.
> >
>
> Sure, we can change that, but isn't the existing example of similar
> message "cannot specify both PARSER and COPY options"  occurs when
> both the options have valid values?  If so, we can use a similar
> principle here, no?

Yes but the difference is that we cannot disable PARSER or COPY by
specifying options whereas we can do something like "VACUUM (FULL
false) tbl" to disable FULL option. I might be misunderstanding the
meaning of "specify" though.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-09 Thread Juan José Santamaría Flecha

On Thu, Apr 9, 2020 at 1:56 PM Ranier Vilela  wrote:

> Attached, your patch with those considerations.
>
I see no attachment.

Regards

Re: Parallel copy

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 1:00 AM Robert Haas  wrote:
>
> On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma  wrote:
> >
> > With option 1 it's not possible to read input data into shared memory
> > and there needs to be an extra memcpy in the time critical sequential
> > flow of the leader. With option 2 data could be read directly into the
> > shared memory buffer. With future async io support, reading and
> > looking for tuple boundaries could be performed concurrently.
>
> But option 2 still seems significantly worse than your proposal above, right?
>
> I really think we don't want a single worker in charge of finding
> tuple boundaries for everybody. That adds a lot of unnecessary
> inter-process communication and synchronization. Each process should
> just get the next tuple starting after where the last one ended, and
> then advance the end pointer so that the next process can do the same
> thing. Vignesh's proposal involves having a leader process that has to
> switch roles - he picks an arbitrary 25% threshold - and if it doesn't
> switch roles at the right time, performance will be impacted. If the
> leader doesn't get scheduled in time to refill the queue before it
> runs completely empty, workers will have to wait. Ants's scheme avoids
> that risk: whoever needs the next tuple reads the next line. There's
> no need to ever wait for the leader because there is no leader.
>

Hmm, I think in his scheme also there is a single reader process.  See
the email above [1] where he described how it should work.  I think
the difference is in the division of work.  AFAIU, in Ants scheme, the
worker needs to pick the work from tuple_offset queue whereas in
Vignesh's scheme it will be based on the size (each worker will get
probably 64KB of work).  I think in his scheme the main thing to find
out is how many tuple offsets to be assigned to each worker in one-go
so that we don't unnecessarily add contention for finding the work
unit.  I think we need to find the right balance between size and
number of tuples.  I am trying to consider size here because larger
sized tuples will probably require more time as we need to allocate
more space for them and also probably requires more processing time.
One way to achieve that could be each worker will try to claim 500
tuples (or some other threshold number) but if their size is greater
than 64K (or some other threshold size) then the worker will try with
lesser number of tuples (such that the size of the chunk of tuples is
less than a threshold size.).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel copy

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 4:20 PM Amit Kapila  wrote:
>
> On Thu, Apr 9, 2020 at 1:00 AM Robert Haas  wrote:
> >
> > On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma  wrote:
> > >
> > > With option 1 it's not possible to read input data into shared memory
> > > and there needs to be an extra memcpy in the time critical sequential
> > > flow of the leader. With option 2 data could be read directly into the
> > > shared memory buffer. With future async io support, reading and
> > > looking for tuple boundaries could be performed concurrently.
> >
> > But option 2 still seems significantly worse than your proposal above, 
> > right?
> >
> > I really think we don't want a single worker in charge of finding
> > tuple boundaries for everybody. That adds a lot of unnecessary
> > inter-process communication and synchronization. Each process should
> > just get the next tuple starting after where the last one ended, and
> > then advance the end pointer so that the next process can do the same
> > thing. Vignesh's proposal involves having a leader process that has to
> > switch roles - he picks an arbitrary 25% threshold - and if it doesn't
> > switch roles at the right time, performance will be impacted. If the
> > leader doesn't get scheduled in time to refill the queue before it
> > runs completely empty, workers will have to wait. Ants's scheme avoids
> > that risk: whoever needs the next tuple reads the next line. There's
> > no need to ever wait for the leader because there is no leader.
> >
>
> Hmm, I think in his scheme also there is a single reader process.  See
> the email above [1] where he described how it should work.
>

oops, I forgot to specify the link to the email.  See
https://www.postgresql.org/message-id/CANwKhkO87A8gApobOz_o6c9P5auuEG1W2iCz0D5CfOeGgAnk3g%40mail.gmail.com


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel copy

2020-04-09 Thread Dilip Kumar

On Thu, Apr 9, 2020 at 1:00 AM Robert Haas  wrote:
>
> On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma  wrote:
> > I think the element based approach and requirement that all tuples fit
> > into the queue makes things unnecessarily complex. The approach I
> > detailed earlier allows for tuples to be bigger than the buffer. In
> > that case a worker will claim the long tuple from the ring queue of
> > tuple start positions, and starts copying it into its local line_buf.
> > This can wrap around the buffer multiple times until the next start
> > position shows up. At that point this worker can proceed with
> > inserting the tuple and the next worker will claim the next tuple.
> >
> > This way nothing needs to be resized, there is no risk of a file with
> > huge tuples running the system out of memory because each element will
> > be reallocated to be huge and the number of elements is not something
> > that has to be tuned.
>
> +1. This seems like the right way to do it.
>
> > > We had a couple of options for the way in which queue elements can be 
> > > stored.
> > > Option 1:  Each element (DSA chunk) will contain tuples such that each
> > > tuple will be preceded by the length of the tuple.  So the tuples will
> > > be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
> > > tuple-2),  Or Option 2: Each element (DSA chunk) will contain only
> > > tuples (tuple-1), (tuple-2), .  And we will have a second
> > > ring-buffer which contains a start-offset or length of each tuple. The
> > > old design used to generate one tuple of data and process tuple by
> > > tuple. In the new design, the server will generate multiple tuples of
> > > data per queue element. The worker will then process data tuple by
> > > tuple. As we are processing the data tuple by tuple, I felt both of
> > > the options are almost the same. However Design1 was chosen over
> > > Design 2 as we can save up on some space that was required by another
> > > variable in each element of the queue.
> >
> > With option 1 it's not possible to read input data into shared memory
> > and there needs to be an extra memcpy in the time critical sequential
> > flow of the leader. With option 2 data could be read directly into the
> > shared memory buffer. With future async io support, reading and
> > looking for tuple boundaries could be performed concurrently.
>
> But option 2 still seems significantly worse than your proposal above, right?
>
> I really think we don't want a single worker in charge of finding
> tuple boundaries for everybody. That adds a lot of unnecessary
> inter-process communication and synchronization. Each process should
> just get the next tuple starting after where the last one ended, and
> then advance the end pointer so that the next process can do the same
> thing. Vignesh's proposal involves having a leader process that has to
> switch roles - he picks an arbitrary 25% threshold - and if it doesn't
> switch roles at the right time, performance will be impacted. If the
> leader doesn't get scheduled in time to refill the queue before it
> runs completely empty, workers will have to wait. Ants's scheme avoids
> that risk: whoever needs the next tuple reads the next line. There's
> no need to ever wait for the leader because there is no leader.

I agree that if the leader switches the role, then it is possible that
sometimes the leader might not produce the work before the queue is
empty.  OTOH, the problem with the approach you are suggesting is that
the work will be generated on-demand, i.e. there is no specific
process who is generating the data while workers are busy inserting
the data.  So IMHO, if we have a specific leader process then there
will always be work available for all the workers.  I agree that we
need to find the correct point when the leader will work as a worker.
One idea could be that when the queue is full and there is no space to
push more work to queue then the leader himself processes that work.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Tomas Vondra


On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote:

On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:

The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it will fit
in work_mem, similar to the old behavior (though it wlil now spill to disk if
the planner was wrong about it fitting in work_mem).  The current default is
true.


Are there any other GUCs that behave like that ?  It's confusing to me when I
see "Disk Usage: ... kB", despite setting it to "disable", and without the
usual disable_cost.  I realize that postgres chose the plan on the hypothesis
that it would *not* exceed work_mem, and that spilling to disk is considered
preferable to ignoring the setting, and that "going back" to planning phase
isn't a possibility.



It it really any different from our enable_* GUCs? Even if you do e.g.
enable_sort=off, we may still do a sort. Same for enable_groupagg etc.


template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM 
generate_series(1,99) a GROUP BY 1 ;
HashAggregate (actual time=1370.945..2877.250 rows=99 loops=1)
  Group Key: a
  Peak Memory Usage: 5017 kB
  Disk Usage: 22992 kB
  HashAgg Batches: 84
  ->  Function Scan on generate_series a (actual time=314.507..741.517 
rows=99 loops=1)

A previous version of the docs said this, which I thought was confusing, and 
you removed it.
But I guess this is the behavior it was trying to .. explain.

+  enable_hashagg_disk (boolean)
+... This only affects the planner choice;
+execution time may still require using disk-based hash
+aggregation. The default is on.

I suggest that should be reworded and then re-introduced, unless there's some
further behavior change allowing the previous behavior of
might-exceed-work-mem.



Yeah, it would be good to mention this is a best-effort setting.


"This setting determines whether the planner will elect to use a hash plan
which it expects will exceed work_mem and spill to disk.  During execution,
hash nodes which exceed work_mem will spill to disk even if this setting is
disabled.  To avoid spilling to disk, either increase work_mem (or set
enable_hashagg=off)."

For sure the release notes should recommend re-calibrating work_mem.



I don't follow. Why would the recalibrating be needed?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Global temporary tables

2020-04-09 Thread tushar


On 4/7/20 2:27 PM, 曾文旌 wrote:
Vacuum full GTT, cluster GTT is already 
supported in global_temporary_table_v24-pg13.patch.


Hi Wenjing,

Please refer this scenario , where reindex   message is not coming next 
time ( after reconnecting to database) for GTT


A)
--normal table
postgres=# create table nt(n int primary key);
CREATE TABLE
--GTT table
postgres=# create global temp table gtt(n int primary key);
CREATE TABLE
B)
--Reindex  , normal table
postgres=# REINDEX (VERBOSE) TABLE  nt;
INFO:  index "nt_pkey" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
REINDEX
--reindex GTT table
postgres=# REINDEX (VERBOSE) TABLE  gtt;
INFO:  index "gtt_pkey" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
REINDEX
C)
--Reconnect  to database
postgres=# \c
You are now connected to database "postgres" as user "tushar".
D) again perform step B)

postgres=# REINDEX (VERBOSE) TABLE  nt;
INFO:  index "nt_pkey" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
REINDEX
postgres=# REINDEX (VERBOSE) TABLE  gtt;   <-- message  not coming
REINDEX

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [Proposal] Global temporary tables

2020-04-09 Thread 曾文旌



> 2020年4月9日 下午7:46，tushar  写道：
> 
> On 4/7/20 2:27 PM, 曾文旌 wrote:
>> Vacuum full GTT, cluster GTT is already supported in 
>> global_temporary_table_v24-pg13.patch.
> 
> Hi Wenjing,
> 
> Please refer this scenario , where reindex   message is not coming next time 
> ( after reconnecting to database) for GTT
> 
> A)
> --normal table
> postgres=# create table nt(n int primary key);
> CREATE TABLE
> --GTT table
> postgres=# create global temp table gtt(n int primary key);
> CREATE TABLE
> B)
> --Reindex  , normal table
> postgres=# REINDEX (VERBOSE) TABLE  nt;
> INFO:  index "nt_pkey" was reindexed
> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> REINDEX
> --reindex GTT table
> postgres=# REINDEX (VERBOSE) TABLE  gtt;
> INFO:  index "gtt_pkey" was reindexed
> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> REINDEX
> C)
> --Reconnect  to database
> postgres=# \c
> You are now connected to database "postgres" as user "tushar".
> D) again perform step B)
> 
> postgres=# REINDEX (VERBOSE) TABLE  nt;
> INFO:  index "nt_pkey" was reindexed
> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> REINDEX
> postgres=# REINDEX (VERBOSE) TABLE  gtt;   <-- message  not coming
> REINDEX
Yes , Since the newly established connection is on the db, the GTT store file 
is not initialized, so there is no info message.

> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company



smime.p7s
Description: S/MIME cryptographic signature

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-09 Thread Ranier Vilela

>On Wed, Apr 8, 2020 at 7:39 PM Juan José Santamaría Flecha

>> Let me explain further, in pg_config_os.h you can check that the value of
>> _WIN32_WINNT is solely based on checking _MSC_VER. This patch should also
>> be meaningful for WIN32 builds using MinGW, or we might see this issue
>> reappear in those systems if update the MIN_WINNT value to more current
>> OS versions. So, I still think _WIN32_WINNT is a better option.
>>
>Thanks for explanation, I was not aware of that, you are right it make
>sense to use " _WIN32_WINNT", Now I am using this only.

>I still see the same last lines in both #ifdef blocks, and pgindent might
>> change a couple of lines to:
>> + MultiByteToWideChar(CP_ACP, 0, winlocname, -1, wc_locale_name,
>> + LOCALE_NAME_MAX_LENGTH);
>> +
>> + if ((GetLocaleInfoEx(wc_locale_name, LOCALE_SNAME,
>> + (LPWSTR), LOCALE_NAME_MAX_LENGTH)) > 0)
>> + {
>>
>Now I have resolved these comments also, Please check updated version of
>the patch.

>> Please open an item in the commitfest for this patch.
>>
>I have created with same title.

Hi,

I have a few comments about the patch, if I may.

1. Variable rc runs the risk of being used uninitialized.

2. Variable loct has a redundant declaration ( = NULL).

3. Return "C", does not solve the first case?

Attached, your patch with those considerations.

regards,

Ranier VIlela

Re: Parallel copy

2020-04-09 Thread Amit Kapila

On Thu, Apr 9, 2020 at 3:55 AM Ants Aasma  wrote:
>
> On Wed, 8 Apr 2020 at 22:30, Robert Haas  wrote:
>
> > - The portion of the time that is used to split the lines is not
> > easily parallelizable. That seems to be a fairly small percentage for
> > a reasonably wide table, but it looks significant (13-18%) for a
> > narrow table. Such cases will gain less performance and be limited to
> > a smaller number of workers. I think we also need to be careful about
> > files whose lines are longer than the size of the buffer. If we're not
> > careful, we could get a significant performance drop-off in such
> > cases. We should make sure to pick an algorithm that seems like it
> > will handle such cases without serious regressions and check that a
> > file composed entirely of such long lines is handled reasonably
> > efficiently.
>
> I don't have a proof, but my gut feel tells me that it's fundamentally
> impossible to ingest csv without a serial line-ending/comment
> tokenization pass.
>

I think even if we try to do it via multiple workers it might not be
better.  In such a scheme,  every worker needs to update the end
boundaries and the next worker to keep a check if the previous has
updated the end pointer.  I think this can add a significant
synchronization effort for cases where tuples are of 100 or so bytes
which will be a common case.

> The current line splitting algorithm is terrible.
> I'm currently working with some scientific data where on ingestion
> CopyReadLineText() is about 25% on profiles. I prototyped a
> replacement that can do ~8GB/s on narrow rows, more on wider ones.
>

Good to hear.  I think that will be a good project on its own and that
might give a boost to parallel copy as with that we can further reduce
the non-parallelizable work unit.

> For rows that are consistently wider than the input buffer I think
> parallelism will still give a win - the serial phase is just memcpy
> through a ringbuffer, after which a worker goes away to perform the
> actual insert, letting the next worker read the data. The memcpy is
> already happening today, CopyReadLineText() copies the input buffer
> into a StringInfo, so the only extra work is synchronization between
> leader and worker.
>
>
> > - There could also be similar contention on the heap. Say the tuples
> > are narrow, and many backends are trying to insert tuples into the
> > same heap page at the same time. This would lead to many lock/unlock
> > cycles. This could be avoided if the backends avoid targeting the same
> > heap pages, but I'm not sure there's any reason to expect that they
> > would do so unless we make some special provision for it.
>
> I thought there already was a provision for that. Am I mis-remembering?
>

The copy uses heap_multi_insert to insert batch of tuples and I think
each batch should ideally use a different page mostly it will be a new
page. So, not sure if this will be a problem or a problem of a level
for which we need to do some special handling.  But if this turns out
to be a problem, we definetly need some better way to deal with it.

> > - What else? I bet the above list is not comprehensive.
>
> I think parallel copy patch needs to concentrate on splitting input
> data to workers. After that any performance issues would be basically
> the same as a normal parallel insert workload. There may well be
> bottlenecks there, but those could be tackled independently.
>

I agree.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench - add pseudo-random permutation function

2020-04-09 Thread David Steele


On 4/2/20 3:01 AM, Alvaro Herrera wrote:

On 2020-Apr-02, Fabien COELHO wrote:


I'm planning to mark this patch RwF on April 8 and I suggest you
resubmit if you are able to get some consensus.


People interested in non-uniform benchmarks would see the point. Why many
people would be happy with uniform benchmarks only while life is not uniform
at all fails me.


I don't think we should boot this patch.  I don't think I would be able
to get this over the commit line in this CF, but let's not discard it.


Understood. I have moved the patch to the 2020-07 CF in Needs Review state.

Regards,
--
-David
da...@pgmasters.net

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-09 Thread Tom Lane

Fujii Masao  writes:
> On 2020/04/09 16:11, Kyotaro Horiguchi wrote:
>> At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane  wrote in
>>> Why is this getting grafted onto BEGIN/START TRANSACTION in the
>>> first place?

>> The rationale for not being a fmgr function is stated in the following
>> comments. [...]

> This issue happens because the function is executed after BEGIN? If yes,
> what about executing the function (i.e., as separate transaction) before 
> BEGIN?
> If so, the snapshot taken in the function doesn't affect the subsequent
> transaction whatever its isolation level is.

I wonder whether making it a procedure, rather than a plain function,
would help any.

regards, tom lane

Re: [Proposal] Global temporary tables

2020-04-09 Thread Erik Rijkers


On 2020-04-09 15:28, 曾文旌 wrote:

[global_temporary_table_v25-pg13.patch]


Part of the problem is that some variables are only used by assert 
statements, and I fixed those alarms.

Please provide your configue parameter, and I will verify it again.



Hi,

Just now I compiled the newer version of your patch (v25), and the 
warnings/notes that I saw earlier, are now gone. Thank you.



In case you still want it here is the configure:

-- [2020.04.09 15:06:45 global_temp_tables/1] ./configure  
--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.global_temp_tables 
--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.global_temp_tables/bin.fast 
--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.global_temp_tables/lib.fast 
--with-pgport=6975 --quiet --enable-depend --with-openssl --with-perl 
--with-libxml --with-libxslt --with-zlib  --enable-tap-tests  
--with-extra-version=_0409


-- [2020.04.09 15:07:13 global_temp_tables/1] make core: make --quiet -j 
4

partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
[-Wunused-variable]

 1024 |  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
  | ^~~
All of PostgreSQL successfully made. Ready to install.


Thanks,

Erik Rijkers

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-09 Thread Robert Haas

On Thu, Apr 9, 2020 at 1:36 AM Amit Kapila  wrote:
> On Thu, Apr 9, 2020 at 7:07 AM Michael Paquier  wrote:
> > On Wed, Apr 08, 2020 at 01:38:54PM -0500, Justin Pryzby wrote:
> > > I think the behavior is correct, but the error message could be improved, 
> > > like:
> > > |ERROR:  cannot specify FULL with PARALLEL jobs
> > > or similar.
> >
> > Perhaps "cannot use FULL and PARALLEL options together"?
> >
>
> We already have a similar message "cannot specify both PARSER and COPY
> options", so I think the current message is fine.

So, whether the error message is OK depends on the details. The
situation as I understand it is that a vacuum cannot be both parallel
and full. If you disallow every command that includes both key words,
then the message seems fine. But suppose you allow

VACUUM (PARALLEL 1, FULL 0) foo;

There's no technical problem here, because the vacuum is not both
parallel and full. But if you allow that case, then there is an error
message problem, perhaps, because your error message says that you
cannot specify both options, but here you did specify both options,
and yet it worked. So really if this case is allowed a more accurate
error message would be something like:

ERROR: VACUUM FULL cannot be performed in parallel

But if you used this latter error message yet disallowed VACUUM
(PARALLEL 1, FULL 0) then it again wouldn't make sense, because the
error message would be now forbidding something that you never tried
to do.

The point is that we need to decide whether we're going to complain
whenever both options are specified in the syntax, or whether we're
going to complain when they're combined in a way that we don't
support. The error message we choose should match whatever decision we
make there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A problem about partitionwise join

2020-04-09 Thread Ashutosh Bapat

>
> I think it would not work for outer joins if we only check
> exprs_known_equal() for equivalences. If the equi-join conditions
> involving pairs of matching partition keys are outer join quals
> mentioning nonnullable side rels, they would not exist in any EC
> according to the current EC infrastructure. So we still have to look
> through restrictlist.
>

When I wrote that function and even today, EC didn't accommodate outer
join equality conditions. If we can somehow do that,
have_partkey_equi_join() can be completely eliminated.

-- 
Best Wishes,
Ashutosh Bapat

Re: backup manifests

2020-04-09 Thread Fujii Masao





On 2020/04/09 2:35, Robert Haas wrote:

On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao  wrote:

When there is a backup_manifest in the database cluster, it's included in
the backup even when --no-manifest is specified. ISTM that this is problematic
because the backup_manifest is obviously not valid for the backup.
So, isn't it better to always exclude the *existing* backup_manifest in the
cluster from the backup, like backup_label/tablespace_map? Patch attached.

Also I found the typo in the document. Patch attached.


Both patches look good. The second one is definitely a mistake on my
part, and the first one seems like a totally reasonable change.
Thanks!


Thanks for reviewing them! I pushed them.

Please note that the commit messages have not been delivered to
pgsql-committers yet.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: backup manifests

2020-04-09 Thread Fujii Masao





On 2020/04/09 23:10, Stephen Frost wrote:

Greetings,

* Fujii Masao (masao.fu...@oss.nttdata.com) wrote:

On 2020/04/09 2:35, Robert Haas wrote:

On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao  wrote:

When there is a backup_manifest in the database cluster, it's included in
the backup even when --no-manifest is specified. ISTM that this is problematic
because the backup_manifest is obviously not valid for the backup.
So, isn't it better to always exclude the *existing* backup_manifest in the
cluster from the backup, like backup_label/tablespace_map? Patch attached.

Also I found the typo in the document. Patch attached.


Both patches look good. The second one is definitely a mistake on my
part, and the first one seems like a totally reasonable change.
Thanks!


Thanks for reviewing them! I pushed them.

Please note that the commit messages have not been delivered to
pgsql-committers yet.


They've been released and your address whitelisted.


Many thanks!!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-09 Thread Amit Langote

On Thu, Apr 9, 2020 at 3:09 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2020-Apr-08, Tom Lane wrote:
> >> I think that #1 would soon lead to needing all the same infrastructure
> >> as we have for inherited columns and constraints, ie triggers would need
> >> equivalents of attislocal and attinhcount.  I don't really want to go
> >> there, so I'd vote for #2.
>
> > Hmm.  Those things are used for the legacy inheritance case supporting
> > multiple inheritance, where we need to figure out which parent the table
> > is being detached (disinherited) from.  But for partitioning we know
> > which parent it is, since there can only be one.  So I don't think that
> > argument applies.
>
> My point is that so long as you only allow the case of exactly one parent,
> you can just delete the child trigger, because it must belong to that
> parent.  As soon as there's any flexibility, you are going to end up
> reinventing all the stuff we had to invent to manage
> maybe-or-maybe-not-inherited columns.  So I think the "detach" idea
> is the first step on that road, and I counsel not taking that step.

As mentioned upthread, we have behavior #1 for indexes (attach
existing / detach & keep), without any of the *islocal, *inhcount
infrastructure. It is a bit complex, because we need logic to check
the equivalence of an existing index on the partition being attached,
so implementing the same behavior for trigger is going to have to be
almost as complex.  Considering that #2 will be much simpler to
implement, but would be asymmetric with everything else.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-09 Thread Ranier Vilela

Em qui., 9 de abr. de 2020 às 09:14, Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> escreveu:

>
> On Thu, Apr 9, 2020 at 1:56 PM Ranier Vilela  wrote:
>
>> Attached, your patch with those considerations.
>>
> I see no attachment.
>
Sorry, my mystake.

regards,
Ranier Vilela


0001-PG-compilation-error-with-VS-2015-2017-2019.patch
Description: Binary data

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-04-09 Thread Julien Rouhaud

On Thu, Apr 9, 2020 at 5:59 AM Fujii Masao  wrote:
>
>
>
> On 2020/04/08 18:31, Julien Rouhaud wrote:
> > On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:
> >>
> >>
> >> On 2020/04/03 16:26, Julien Rouhaud wrote:
> >>> On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
>  Fujii Masao-4 wrote
> > On 2020/04/01 18:19, Fujii Masao wrote:
> >
> > Finally I pushed the patch!
> > Many thanks for all involved in this patch!
> >
> > As a remaining TODO item, I'm thinking that the document would need to
> > be improved. For example, previously the query was not stored in pgss
> > when it failed. But, in v13, if pgss_planning is enabled, such a query 
> > is
> > stored because the planning succeeds. Without the explanation about
> > that behavior in the document, I'm afraid that users will get confused.
> > Thought?
> 
>  Thank you all for this work and especially to Julian for its major
>  contribution !
> >>>
> >>>
> >>> Thanks a lot to everyone!  This was quite a long journey.
> >>>
> >>>
>  Regarding the TODO point: Yes I agree that it can be improved.
>  My proposal:
> 
>  "Note that planning and execution statistics are updated only at their
>  respective end phase, and only for successfull operations.
>  For exemple executions counters of a long running SELECT query,
>  will be updated at the execution end, without showing any progress
>  report in the interval.
>  Other exemple, if the statement is successfully planned but fails in
>  the execution phase, only its planning statistics are stored.
>  This may give uncorrelated plans vs calls informations."
> >>
> >> Thanks for the proposal!
> >>
> >>> There are numerous reasons for lack of correlation between number of 
> >>> planning
> >>> and number of execution, so I'm afraid that this will give users the false
> >>> impression that only failed execution can lead to that.
> >>>
> >>> Here's some enhancement on your proposal:
> >>>
> >>> "Note that planning and execution statistics are updated only at their
> >>> respective end phase, and only for successful operations.
> >>> For example the execution counters of a long running query
> >>> will only be updated at the execution end, without showing any progress
> >>> report before that.
> >>
> >> Probably since this is not the example for explaining the relationship of
> >> planning and execution stats, it's better to explain this separately or 
> >> just
> >> drop it?
> >>
> >> What about the attached patch based on your proposals?
> >>
> >
> > Thanks Fuji-san, it looks perfect to me!
>
> Thanks for the check! Pushed!

Thanks a lot Fuji-san!

For the record, the commit is available, but I didn't receive the
usual mail, and it's also not present in the archives apparently:
https://www.postgresql.org/list/pgsql-committers/since/20200409/
(although Amit's latest commit was delivered as expected).

Given your previous discussion with Magnus, I'm assuming that your
address is now allowed to post for a year. I'm not sure what went
wrong here, so I'm adding Magnus in Cc.

Re: Report error position in partition bound check

2020-04-09 Thread Tom Lane

While I'm quite on board with providing useful error cursors,
the example cases in this patch don't seem all that useful:

 -- trying to create range partition with empty range
 CREATE TABLE fail_part PARTITION OF range_parted2 FOR VALUES FROM (1) TO (0);
 ERROR:  empty range bound specified for partition "fail_part"
+LINE 1: ...E fail_part PARTITION OF range_parted2 FOR VALUES FROM (1) T...
+ ^
 DETAIL:  Specified lower bound (1) is greater than or equal to upper bound (0).

As best I can tell from these examples, the cursor will always
point at the FROM keyword, making it pretty unhelpful.  It seems
like in addition to getting the query string passed down, you
need to do some work on the code that's actually reporting the
error position.  I'd expect at a minimum that the pointer allows
identifying which column of a multi-column partition key is
giving trouble.  The phrasing of this particular message, for
example, suggests that it ought to point at the "1" expression.

regards, tom lane

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-04-09 Thread Fujii Masao





On 2020/04/09 22:31, Julien Rouhaud wrote:

On Thu, Apr 9, 2020 at 5:59 AM Fujii Masao  wrote:




On 2020/04/08 18:31, Julien Rouhaud wrote:

On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:



On 2020/04/03 16:26, Julien Rouhaud wrote:

On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:

Fujii Masao-4 wrote

On 2020/04/01 18:19, Fujii Masao wrote:

Finally I pushed the patch!
Many thanks for all involved in this patch!

As a remaining TODO item, I'm thinking that the document would need to
be improved. For example, previously the query was not stored in pgss
when it failed. But, in v13, if pgss_planning is enabled, such a query is
stored because the planning succeeds. Without the explanation about
that behavior in the document, I'm afraid that users will get confused.
Thought?


Thank you all for this work and especially to Julian for its major
contribution !



Thanks a lot to everyone!  This was quite a long journey.



Regarding the TODO point: Yes I agree that it can be improved.
My proposal:

"Note that planning and execution statistics are updated only at their
respective end phase, and only for successfull operations.
For exemple executions counters of a long running SELECT query,
will be updated at the execution end, without showing any progress
report in the interval.
Other exemple, if the statement is successfully planned but fails in
the execution phase, only its planning statistics are stored.
This may give uncorrelated plans vs calls informations."


Thanks for the proposal!


There are numerous reasons for lack of correlation between number of planning
and number of execution, so I'm afraid that this will give users the false
impression that only failed execution can lead to that.

Here's some enhancement on your proposal:

"Note that planning and execution statistics are updated only at their
respective end phase, and only for successful operations.
For example the execution counters of a long running query
will only be updated at the execution end, without showing any progress
report before that.


Probably since this is not the example for explaining the relationship of
planning and execution stats, it's better to explain this separately or just
drop it?

What about the attached patch based on your proposals?



Thanks Fuji-san, it looks perfect to me!


Thanks for the check! Pushed!


Thanks a lot Fuji-san!

For the record, the commit is available, but I didn't receive the
usual mail, and it's also not present in the archives apparently:
https://www.postgresql.org/list/pgsql-committers/since/20200409/
(although Amit's latest commit was delivered as expected).


Yes.


Given your previous discussion with Magnus, I'm assuming that your
address is now allowed to post for a year. I'm not sure what went
wrong here, so I'm adding Magnus in Cc.


Thanks! I also reported the issue in pgsql-www.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: backup manifests

2020-04-09 Thread Stephen Frost

Greetings,

* Fujii Masao (masao.fu...@oss.nttdata.com) wrote:
> On 2020/04/09 2:35, Robert Haas wrote:
> >On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao  
> >wrote:
> >>When there is a backup_manifest in the database cluster, it's included in
> >>the backup even when --no-manifest is specified. ISTM that this is 
> >>problematic
> >>because the backup_manifest is obviously not valid for the backup.
> >>So, isn't it better to always exclude the *existing* backup_manifest in the
> >>cluster from the backup, like backup_label/tablespace_map? Patch attached.
> >>
> >>Also I found the typo in the document. Patch attached.
> >
> >Both patches look good. The second one is definitely a mistake on my
> >part, and the first one seems like a totally reasonable change.
> >Thanks!
> 
> Thanks for reviewing them! I pushed them.
> 
> Please note that the commit messages have not been delivered to
> pgsql-committers yet.

They've been released and your address whitelisted.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: ALTER TABLE ... SET STORAGE does not propagate to indexes

2020-04-09 Thread Peter Eisentraut


On 2020-03-30 18:17, Alvaro Herrera wrote:

On 2020-Feb-25, Peter Eisentraut wrote:

An alternative would be that we make this situation fully supported. Then
we'd probably need at least ALTER INDEX ... ALTER COLUMN ... SET STORAGE,
and some pg_dump support.


I think this is a more promising direction.


I have started implementing the ALTER INDEX command, which by itself 
isn't very hard, but it requires significant new infrastructure in 
pg_dump, and probably also a bit of work in psql, and that's all a bit 
too much right now.


An alternative for the short term is the attached patch.  It's the same 
as before, but I have hacked up the test_decoding test to achieve the 
effect of ALTER INDEX with direct catalog manipulation.  This preserves 
the spirit of the test case, but allows us to fix everything else about 
this situation.


One thing to remember is that the current situation is broken.  While 
you can set index columns to have different storage than the 
corresponding table columns, pg_dump does not preserve that, because it 
dumps indexes after ALTER TABLE commands.  So at the moment, having 
these two things different isn't really supported.  The proposed patch 
just makes this behave consistently and allows adding an ALTER INDEX 
command later on if desired.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From b349cb8ff56c8ec547420994fb037d6c4b397193 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Apr 2020 14:10:01 +0200
Subject: [PATCH v3] Propagate ALTER TABLE ... SET STORAGE to indexes

When creating a new index, the attstorage setting of the table column
is copied to regular (non-expression) index columns.  But a later
ALTER TABLE ... SET STORAGE is not propagated to indexes, thus
creating an inconsistent and undumpable state.

Discussion: 
https://www.postgresql.org/message-id/flat/9765d72b-37c0-06f5-e349-2a580aafd989%402ndquadrant.com
---
 contrib/test_decoding/expected/toast.out  |  4 ++
 contrib/test_decoding/sql/toast.sql   |  5 +++
 src/backend/commands/tablecmds.c  | 47 +++
 src/test/regress/expected/alter_table.out | 20 ++
 src/test/regress/expected/vacuum.out  |  4 +-
 src/test/regress/sql/alter_table.sql  |  6 +++
 src/test/regress/sql/vacuum.sql   |  4 +-
 7 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/contrib/test_decoding/expected/toast.out 
b/contrib/test_decoding/expected/toast.out
index 91a9a1e86d..75c4d22d80 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -305,6 +305,10 @@ ALTER TABLE toasted_several REPLICA IDENTITY FULL;
 ALTER TABLE toasted_several ALTER COLUMN toasted_key SET STORAGE EXTERNAL;
 ALTER TABLE toasted_several ALTER COLUMN toasted_col1 SET STORAGE EXTERNAL;
 ALTER TABLE toasted_several ALTER COLUMN toasted_col2 SET STORAGE EXTERNAL;
+-- Change the storage of the index back to EXTENDED, separately from
+-- the table.  This is currently not doable via DDL, but it is
+-- supported internally.
+UPDATE pg_attribute SET attstorage = 'x' WHERE attrelid = 
'toasted_several_pkey'::regclass AND attname = 'toasted_key';
 INSERT INTO toasted_several(toasted_key) VALUES(repeat('9876543210', 1));
 SELECT pg_column_size(toasted_key) > 2^16 FROM toasted_several;
  ?column? 
diff --git a/contrib/test_decoding/sql/toast.sql 
b/contrib/test_decoding/sql/toast.sql
index ef11d2bfff..016c3ab784 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -279,6 +279,11 @@ CREATE TABLE toasted_several (
 ALTER TABLE toasted_several ALTER COLUMN toasted_col1 SET STORAGE EXTERNAL;
 ALTER TABLE toasted_several ALTER COLUMN toasted_col2 SET STORAGE EXTERNAL;
 
+-- Change the storage of the index back to EXTENDED, separately from
+-- the table.  This is currently not doable via DDL, but it is
+-- supported internally.
+UPDATE pg_attribute SET attstorage = 'x' WHERE attrelid = 
'toasted_several_pkey'::regclass AND attname = 'toasted_key';
+
 INSERT INTO toasted_several(toasted_key) VALUES(repeat('9876543210', 1));
 SELECT pg_column_size(toasted_key) > 2^16 FROM toasted_several;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6162fb018c..8b0194e17f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7382,6 +7382,7 @@ ATExecSetStorage(Relation rel, const char *colName, Node 
*newValue, LOCKMODE loc
Form_pg_attribute attrtuple;
AttrNumber  attnum;
ObjectAddress address;
+   ListCell   *lc;
 
Assert(IsA(newValue, String));
storagemode = strVal(newValue);
@@ -7441,6 +7442,52 @@ ATExecSetStorage(Relation rel, const char *colName, Node 
*newValue, LOCKMODE loc
 
heap_freetuple(tuple);
 
+   /*
+* Apply the change to indexes as well (only for simple index columns,
+* matching

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-09 Thread Tom Lane

Amit Langote  writes:
> On Thu, Apr 9, 2020 at 3:09 AM Tom Lane  wrote:
>> My point is that so long as you only allow the case of exactly one parent,
>> you can just delete the child trigger, because it must belong to that
>> parent.  As soon as there's any flexibility, you are going to end up
>> reinventing all the stuff we had to invent to manage
>> maybe-or-maybe-not-inherited columns.  So I think the "detach" idea
>> is the first step on that road, and I counsel not taking that step.

> As mentioned upthread, we have behavior #1 for indexes (attach
> existing / detach & keep), without any of the *islocal, *inhcount
> infrastructure. It is a bit complex, because we need logic to check
> the equivalence of an existing index on the partition being attached,
> so implementing the same behavior for trigger is going to have to be
> almost as complex.  Considering that #2 will be much simpler to
> implement, but would be asymmetric with everything else.

I think there is justification for jumping through some hoops for
indexes, because they can be extremely expensive to recreate.
The same argument doesn't hold even a little bit for child
triggers, though.

Also it can be expected that an index will still behave sensibly after
its table is standalone, whereas that's far from obvious for a trigger
that was meant to work on partition member tables.

regards, tom lane

Re: Report error position in partition bound check

2020-04-09 Thread Ashutosh Bapat

Hi Alexandra,
As Michael said it will be considered for the next commitfest. But
from a quick glance, a suggestion.
Instead of passing NULL parsestate from ATExecAttachPartition, pass
make_parsestate(NULL). parse_errorposition() takes care of NULL parse
state input, but it might be safer this way. Better if we could cook
up a parse state with the query text available in
AlterTableUtilityContext available in ATExecCmd().

On Thu, Apr 9, 2020 at 6:36 AM Alexandra Wang  wrote:
>
> Forgot to run make installcheck. Here's the new version of the patch that 
> updated the test answer file.
>

-- 
Best Wishes,
Ashutosh Bapat

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-09 Thread Ashutosh Bapat

On Thu, Apr 9, 2020 at 12:03 PM Etsuro Fujita  wrote:
>
> On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> > Etsuro Fujita  writes:
> > > Yeah, partition_bounds_merge() is currently called only from
> > > try_partitionwise_join(), which guarantees that the strategies are the
> > > same.
>
> > If there's only one caller and there's not likely to ever be more,
> > then I tend to agree that you don't need the assertion.
>
> It seems unlikely that partition_bounds_merge() will be called from
> more places in the foreseeable future, so I'd still vote for removing
> the assertion.

When I wrote that function, I had UNION also in mind. A UNION across
multiple partitioned relations will be partitioned if we can merge the
partition bounds in a sensible manner. Of course the current structure
of that function looks more purposed for join, but it's not difficult
to convert it to be used for UNION as well. In that case those set of
functions will have many more callers. So, I will vote to keep that
assertion now that we have it there.
-- 
Best Wishes,
Ashutosh Bapat

Re: Report error position in partition bound check

2020-04-09 Thread Amit Langote

On Thu, Apr 9, 2020 at 10:51 PM Ashutosh Bapat
 wrote:
>
> Hi Alexandra,
> As Michael said it will be considered for the next commitfest. But
> from a quick glance, a suggestion.
> Instead of passing NULL parsestate from ATExecAttachPartition, pass
> make_parsestate(NULL). parse_errorposition() takes care of NULL parse
> state input, but it might be safer this way. Better if we could cook
> up a parse state with the query text available in
> AlterTableUtilityContext available in ATExecCmd().

+1.  Maybe pass the *context* down to ATExecAttachPartition() from
ATExecCmd() rather than a ParseState.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

cleaning perl code

2020-04-09 Thread Andrew Dunstan



We currently only run perlcritic at severity level 5, which is fairly
permissive. I'd like to reduce that, ideally to, say, level 3, which is
what I use for the buildfarm code.

But let's start by going to severity level 4. Give this perlcriticrc,
derived from the buildfarm's:


# for policy descriptions see
# https://metacpan.org/release/Perl-Critic

severity = 4

theme = core

# allow octal constants with leading zeros
[-ValuesAndExpressions::ProhibitLeadingZeros]

# allow assignments to %ENV and %SIG without 'local'
[Variables::RequireLocalizedPunctuationVars]
allow = %ENV %SIG

# allow 'no warnings qw(once)
[TestingAndDebugging::ProhibitNoWarnings]
allow = once

# allow opened files to stay open for more than 9 lines of code
[-InputOutput::RequireBriefOpen]

Here's a summary of the perlcritic warnings:


 39 Always unpack @_ first
 30 Code before warnings are enabled
 12 Subroutine "new" called using indirect syntax
  9 Multiple "package" declarations
  9 Expression form of "grep"
  7 Symbols are exported by default
  5 Warnings disabled
  4 Magic variable "$/" should be assigned as "local"
  4 Comma used to separate statements
  2 Readline inside "for" loop
  2 Pragma "constant" used
  2 Mixed high and low-precedence booleans
  2 Don't turn off strict for large blocks of code
  1 Magic variable "@a" should be assigned as "local"
  1 Magic variable "$|" should be assigned as "local"
  1 Magic variable "$\" should be assigned as "local"
  1 Magic variable "$?" should be assigned as "local"
  1 Magic variable "$," should be assigned as "local"
  1 Magic variable "$"" should be assigned as "local"
  1 Expression form of "map"

which isn't a huge number.

I'm going to start posting patches to address these issues, and when
we're done we can lower the severity level and start again on the level
3s :-)


cheers


andrew






-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-09 Thread Tomas Vondra

On Thu, Apr 09, 2020 at 07:34:01PM +0530, Ashutosh Bapat wrote:

On Thu, Apr 9, 2020 at 12:03 PM Etsuro Fujita  wrote:

On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> Etsuro Fujita  writes:
> > Yeah, partition_bounds_merge() is currently called only from
> > try_partitionwise_join(), which guarantees that the strategies are the
> > same.

> If there's only one caller and there's not likely to ever be more,
> then I tend to agree that you don't need the assertion.

It seems unlikely that partition_bounds_merge() will be called from
more places in the foreseeable future, so I'd still vote for removing
the assertion.

When I wrote that function, I had UNION also in mind. A UNION across
multiple partitioned relations will be partitioned if we can merge the
partition bounds in a sensible manner. Of course the current structure
of that function looks more purposed for join, but it's not difficult
to convert it to be used for UNION as well. In that case those set of
functions will have many more callers. So, I will vote to keep that
assertion now that we have it there.

Yeah. I really don't see why we should remove an assertion that enforces
something useful, especially when it's just a plain comparions. Had it
been some expensive assert, maybe. But how much slower does this make
an assert-enabled build? 0.0001% or something like that?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

96 matches

Mail list logo