get rid of distprep?

2020-08-08 Thread Peter Eisentraut
I'm thinking about whether we should get rid of the distprep target, the 
step in the preparation of the official source tarball that creates a 
bunch of prebuilt files using bison, flex, perl, etc. for inclusion in 
the tarball.  I think this concept is no longer fitting for contemporary 
software distribution.


There is a lot of interest these days in making the artifacts of 
software distribution traceable, for security and legal reasons.  You 
can trace the code from an author into Git, from Git into a tarball, 
somewhat from a tarball into a binary package (for example using 
reproduceable builds), from a binary package onto a user's system. 
Having some mystery prebuilt files in the middle there does not feel 
right.  Packaging guidelines nowadays tend to disfavor such practices 
and either suggest, recommend, or require removing and rebuilding such 
files.  This whole thing was fairly cavalier when we shipped gram.c, 
scan.c, and one or two other files, but now the number of prebuilt files 
is more than 100, not including the documentation, so this is a bit more 
serious.


Practically, who even uses source tarballs these days?  They are a 
vehicle for packagers, but packagers are not really helped by adding a 
bunch of prebuilt files.  I think this practice started before there 
even were things like rpm.  Nowadays, most people who want to work with 
the source should and probably do use git, so making the difference 
between a git checkout and a source tarball smaller would probably be 
good.  And it would also make the actual tarball smaller.


The practical costs of this are also not negligible.  Because of the 
particular way configure handles bison and flex, it happens a bunch of 
times on new and test systems that the build proceeds and then tells you 
you should have installed bison 5 minutes ago.   Also, extensions cannot 
rely on bison, flex, or perl being available, except it often works so 
it's not dealt with correctly.


Who benefits from these prebuilt files?  I doubt anyone actually has 
problems obtaining useful installations of bison, flex, or perl.  There 
is the documentation build, but that also seems pretty robust nowadays 
and in any case you don't need to build the documentation to get a 
useful installation.  We could make some adjustments so that not 
building the documentation is more accessible.  The only users of this 
would appear to be those not using git and not using any packaging. 
That number is surely not zero, but it's probably very small and doesn't 
seem worth catering to specifically.


Thoughts?

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Should the nbtree page split REDO routine's locking work more like the locking on the primary?

2020-08-08 Thread Andrey M. Borodin



> 8 авг. 2020 г., в 03:28, Peter Geoghegan  написал(а):
> 
> On Thu, Aug 6, 2020 at 7:00 PM Peter Geoghegan  wrote:
>> On Thu, Aug 6, 2020 at 6:08 PM Tom Lane  wrote:
>>> +1 for making this more like what happens in original execution ("on the
>>> primary", to use your wording).  Perhaps what you suggest here is still
>>> not enough like the original execution, but it sounds closer.
>> 
>> It won't be the same as the original execution, exactly -- I am only
>> thinking of holding on to same-level page locks (the original page,
>> its new right sibling, and the original right sibling).
> 
> I pushed a commit that reorders the lock acquisitions within
> btree_xlog_unlink_page() -- they're now consistent with _bt_split()
> (at least among sibling pages involved in the page split).

Sounds great, thanks!

Best regards, Andrey Borodin.



Re: [PATCH] Covering SPGiST index

2020-08-08 Thread Andrey M. Borodin



> 7 авг. 2020 г., в 16:59, Pavel Borisov  написал(а):
> 
> As usual I very much appreciate your feedback

Thanks for the patch! Looks interesting.

On a first glance the whole concept of non-multicolumn index with included 
attributes seems...well, just difficult to understand.
But I expect for SP-GiST this must be single key with multiple included 
attributes, right?
I couldn't find a test that checks impossibility of on 2-column SP-GiST, only 
few asserts about it. Is this checked somewhere else?

Thanks!

Best regards, Andrey Borodin.



Re: [Patch] Optimize dropping of relation buffers using dlist

2020-08-08 Thread Amit Kapila
On Fri, Aug 7, 2020 at 9:33 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Sat, Aug 1, 2020 at 1:53 AM Andres Freund  wrote:
> >> We could also just use pg_class.relpages. It'll probably mostly be
> >> accurate enough?
>
> > Don't we need the accurate 'number of blocks' if we want to invalidate
> > all the buffers? Basically, I think we need to perform BufTableLookup
> > for all the blocks in the relation and then Invalidate all buffers.
>
> Yeah, there is no room for "good enough" here.  If a dirty buffer remains
> in the system, the checkpointer will eventually try to flush it, and fail
> (because there's no file to write it to), and then checkpointing will be
> stuck.  So we cannot afford to risk missing any buffers.
>

Right, this reminds me of the discussion we had last time on this
topic where we decided that we can't even rely on using smgrnblocks to
find the exact number of blocks because lseek might lie about the EOF
position [1]. So, we anyway need some mechanism to push the
information related to the "to be truncated or dropped relations" to
the background worker (checkpointer and or others) to avoid flush
issues. But, maybe it is better to push the responsibility of
invalidating the buffers for truncated/dropped relation to the
background process. However, I feel for some cases where relation size
is greater than the number of shared buffers there might not be much
benefit in pushing this operation to background unless there are
already a few other relation entries (for dropped relations) so that
cost of scanning the buffers can be amortized.

[1] - https://www.postgresql.org/message-id/16664.1435414204%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.




Re: [Patch] Optimize dropping of relation buffers using dlist

2020-08-08 Thread Amit Kapila
On Fri, Aug 7, 2020 at 11:03 PM Robert Haas  wrote:
>
> On Fri, Aug 7, 2020 at 12:52 PM Tom Lane  wrote:
> > At least in the case of segment zero, the file will still exist.  It'll
> > have been truncated to zero length, and if the filesystem is stupid about
> > holes in files then maybe a write to a high block number would consume
> > excessive disk space, but does anyone still care about such filesystems?
> > I don't remember at the moment how we handle higher segments,
> >

We do unlink them and register the request to forget the Fsync
requests for those. See mdunlinkfork.

> > but likely
> > we could make them still exist too, postponing all the unlinks till after
> > checkpoint.  Or we could just have the backends give up on recycling a
> > particular buffer if they can't write it (which is the response to an I/O
> > failure already, I hope).
> >

Note that we don't often try to flush the buffers from the backend. We
first try to forward the request to checkpoint queue and only if the
queue is full, the backend tries to flush it, so even if we decide to
give up flushing such a buffer (where we get an error) via backend, it
shouldn't impact very many cases. I am not sure but if we can somehow
reliably distinguish this type of error from any other I/O failure
then we can probably give up on flushing this buffer and continue or
maybe just retry to push this request to checkpointer.

>
> None of this sounds very appealing. Postponing the unlinks means
> postponing recovery of the space at the OS level, which I think will
> be noticeable and undesirable for users. The other notions all seem to
> involve treating as valid on-disk states we currently treat as
> invalid, and our sanity checks in this area are already far too weak.
> And all you're buying for it is putting a hash table that would
> otherwise be shared memory into backend-private memory, which seems
> like quite a minor gain. Having that information visible to everybody
> seems a lot cleaner.
>

The one more benefit of giving this responsibility to a single process
like checkpointer is that we can avoid unlinking the relation until we
scan all the buffers corresponding to it. Now, surely keeping it in
shared memory and allow other processes to work on it has other merits
which are that such buffers might get invalidated faster but not sure
we can retain the benefit of another approach which is to perform all
such invalidation of buffers before unlinking the relation's first
segment.

-- 
With Regards,
Amit Kapila.




Re: LSM tree for Postgres

2020-08-08 Thread Konstantin Knizhnik




On 07.08.2020 15:31, Alexander Korotkov wrote:

ср, 5 авг. 2020 г., 09:13 Konstantin Knizhnik :

Concerning degrade of basic index - B-Tree itself is balanced tree. Yes,
insertion of random keys can cause split of B-Tree page.
In the worst case half of B-Tree page will be empty. So B-Tree size will
be two times larger than ideal tree.
It may cause degrade up to two times. But that is all. There should not
be infinite degrade of speed tending to zero.

My concerns are not just about space utilization.  My main concern is
about the order of the pages.  After the first merge the base index
will be filled in key order.  So physical page ordering perfectly
matches their logical ordering.  After the second merge some pages of
base index splits, and new pages are added to the end of the index.
Splits also happen in key order.  So, now physical and logical
orderings match within two extents corresponding to first and second
merges, but not within the whole tree.  While there are only few such
extents, disk page reads may in fact be mostly sequential, thanks to
OS cache and readahead.  But finally, after many merges, we can end up
with mostly random page reads.  For instance, leveldb doesn't have a
problem of ordering degradation, because it stores levels in sorted
files.

I agree with your that loosing sequential order of B-Tree pages may have 
negative impact on performance.
But it first of all critical for order-by and range queries, when we 
should traverse several subsequent leave pages.
It is less critical for exact-search or delete/insert operations. 
Efficiency of merge operations mostly depends on how much keys
will be stored at the same B-Tree page. And it is first of all 
determined by size of top index and key distribution.







Re: get rid of distprep?

2020-08-08 Thread Tom Lane
Peter Eisentraut  writes:
> I'm thinking about whether we should get rid of the distprep target, ...

> Who benefits from these prebuilt files?  I doubt anyone actually has 
> problems obtaining useful installations of bison, flex, or perl.

I'm sure it was a bigger issue twenty years ago, but yeah, nowadays
our minimum requirements for those tools are so ancient that everybody
who cares to build from source should have usable versions available.

I think the weak spot in your argument, though, is the documentation.
There is basically nothing that is standardized or reproducible in
that toolchain, as every platform names and subdivides the relevant
packages differently, if they exist at all.  I was reminded of that
just recently when I updated my main workstation to RHEL8, and had to
jump through a lot of hoops to get everything installed that's needed
to build the docs (and I still lack the tools for some of the weirder
products such as epub).  I'd be willing to say "you must have bison,
flex, and perl to build" --- and maybe we could even avoid having a
long discussion about what "perl" means in this context --- but I
fear the doc tools situation would be a mess.

> The only users of this 
> would appear to be those not using git and not using any packaging. 

No, there's the packagers themselves who would be bearing the brunt of
rediscovering how to build the docs on their platforms.  And if the
argument is that there's a benefit to them of making the build more
reproducible, I'm not sure I buy it, because of (1) timestamps in the
output files and (2) docbook's willingness to try to download missing
bits off the net.  (2) is a huge and not very obvious hazard to
reproducibility.

But maybe you ought to be surveying -packagers about the question
instead of theorizing here.  Would *they* see this as a net benefit?

One other point to consider is that distprep or no distprep, I'd be
quite sad if the distclean target went away.  That's extremely useful
in normal development workflows to tear down everything that depends
on configure output, without giving up some of the more expensive
build products such as gram.c and preproc.c.

regards, tom lane




Re: Replace remaining StrNCpy() by strlcpy()

2020-08-08 Thread Tom Lane
Peter Eisentraut  writes:
> I removed namecpy() altogether because you can just use struct assignment.

Makes sense, and I notice it was unused anyway.

v3 passes eyeball examination (I didn't bother running tests), with
only one remaining nit: the proposed commit message says

They are equivalent,

which per this thread is incorrect.  Somebody might possibly refer to this
commit for guidance in updating third-party code, so I don't think we want
to leave a misleading claim here.  Perhaps something like

They are equivalent, except that StrNCpy zero-fills the entire
destination buffer instead of providing just one trailing zero.
For all but a tiny number of callers, that's just overhead rather
than being desirable.

regards, tom lane




Re: walsender waiting_for_ping spuriously set

2020-08-08 Thread Alvaro Herrera
Pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Amcheck: do rightlink verification with lock coupling

2020-08-08 Thread Peter Geoghegan
On Thu, Aug 6, 2020 at 10:59 PM Andrey M. Borodin  wrote:
> But having complete solution with no false positives seems much better.

Agreed. I know that you didn't pursue this for no reason -- having the
check available makes bt_check_index() a lot more valuable in
practice. It detects what is actually a classic example of subtle
B-Tree corruption (left link corruption), which appears in Modern
B-Tree techniques in its discussion of corruption detection. It's
actually the canonical example of how B-Tree corruption can be very
subtle in the real world.

I pushed a cleaned up version of this patch just now. I added some
commentary about this canonical example in header comments for the new
function.

Thanks
--
Peter Geoghegan




Re: LSM tree for Postgres

2020-08-08 Thread Alexander Korotkov
On Sat, Aug 8, 2020 at 5:07 PM Konstantin Knizhnik
 wrote:
> I agree with your that loosing sequential order of B-Tree pages may have
> negative impact on performance.
> But it first of all critical for order-by and range queries, when we
> should traverse several subsequent leave pages.
> It is less critical for exact-search or delete/insert operations.
> Efficiency of merge operations mostly depends on how much keys
> will be stored at the same B-Tree page.

What do you mean by "mostly"?  Given PostgreSQL has quite small (8k)
pages, sequential read in times faster than random read on SSDs
(dozens of times on HDDs).  I don't think this is something to
neglect.

> And it is first of all
> determined by size of top index and key distribution.

How can you be sure that the top index can fit memory?  On production
systems, typically there are multiple consumers of memory: other
tables, indexes, other LSMs.  This is one of reasons why LSM
implementations have multiple levels: they don't know in advance which
levels fit memory.  Another reason is dealing with very large
datasets.  And I believe there is a quite strong reason to keep page
order sequential within level.

I'm OK with your design for a third-party extension.  It's very cool
to have.  But I'm -1 for something like this to get into core
PostgreSQL, assuming it's feasible to push some effort and get
state-of-art LSM there.

--
Regards,
Alexander Korotkov




2020-08-13 Update + PostgreSQL 13 Beta 3 Release Announcement Draft

2020-08-08 Thread Jonathan S. Katz
Hi,

Attached is a draft of the release announcement for the update release
on 2020-08-13, which also includes the release of PostgreSQL 13 Beta 3.
Reviews and feedback are welcome.

This is a fairly hefty release announcement as it includes notes both
about the update release and the beta. I tried to keep the notes about
Beta 3 focused on the significant changes, with a reference to the open
items page. If you believe I missed something that is significant,
please let me know.

Please be sure all feedback is delivered by 2020-08-12 AoE.

Thanks,

Jonathan
2020-08-13 Cumulative Update Update


The PostgreSQL Global Development Group has released an update to all supported
versions of our database system, including 12.4, 11.9, 10.14, 9.6.19, and
9.5.23, as well as the 3rd Beta release of PostgreSQL 13. This release fixes
over 50 bugs reported over the last three months.

Please plan to update at your earliest convenience.

A Note on the PostgreSQL 13 Beta


This release marks the third beta release of PostgreSQL 13 and puts the
community one step closer to general availability this fall.

In the spirit of the open source PostgreSQL community, we strongly encourage you
to test the new features of PostgreSQL 13 in your database systems to help us
eliminate any bugs or other issues that may exist. While we do not advise you to
run PostgreSQL 13 Beta 3 in your production environments, we encourage you to
find ways to run your typical application workloads against this beta release.

Your testing and feedback will help the community ensure that the PostgreSQL 13
release upholds our standards of providing a stable, reliable release of the
world's most advanced open source relational database.

PostgreSQL 9.5 EOL Notice
-

PostgreSQL 9.5 will stop receiving fixes on February 11, 2021. If you are
running PostgreSQL 9.5 in a production environment, we suggest that you make
plans to upgrade to a newer, supported version of PostgreSQL. Please see our
[versioning policy](https://www.postgresql.org/support/versioning/) for more
information.

Bug Fixes and Improvements
--

This update also fixes over 50 bugs that were reported in the last several
months. Some of these issues affect only version 12, but may also affect all
supported versions.

Some of these fixes include:

* Fix edge cases in partition pruning involving multiple partition key columns
with multiple or no constraining WHERE clauses.
* Several fixes for query planning and execution involving partitions.
* Fix for determining when to execute a column-specific UPDATE trigger on a
logical replication subscriber.
* `pg_replication_slot_advance()` now updates the oldest xmin and LSN values, as
the failure to do this could prevent resources (e.g. WAL files) from being 
cleaned up.
* Performance improvements for `ts_headline()`.
* Ensure that `pg_read_file()` and related functions read until EOF is reached,
which fixes compatibility with pipes and other virtual files.
* Forbid numeric `NaN` values in jsonpath computations, which do not exist in
SQL nor JSON.
* Several fixes for `NaN` inputs with aggregates functions. This fixes a change
in PostgreSQL 12 where `NaN` values in the following aggregates to emitted value
of `0` instead of `NaN`: `corr()`, `covar_pop()`, `regr_intercept()`,
`regr_r2()`, `regr_slope()`, `regr_sxx()`, `regr_sxy()`, `regr_syy()`,
`stddev_pop()`, and `var_pop()`.
* `time` and `timetz` values greater than `24:00:00` are now rejected.
* Several fixes for `EXPLAIN`, including a fix for reporting resource usage when
a plan uses parallel works with "Gather Merge" nodes.
* Fix timing of constraint revalidation in `ALTER TABLE` that could lead to odd
errors.
* Fix for REINDEX CONCURRENTLY that could prevent old values from being included
in future logical decoding output.
* Fix for LATERAL references that could potentially cause crashes during query
execution.
* Use the collation specified for a query when estimating operator costs
* Fix conflict-checking anomalies in SERIALIZABLE transaction isolation mode.
* Ensure checkpointer process discards file sync requests when fsync is off
* Fix issue where `pg_control` could be written out with an inconsistent
checksum, which could lead to the inability to restart the database if it
crashed before the next `pg_control` update.
* Ensure that libpq continues to try to read from the database connection socket
after a write failure, as this allows the connection to collect any final error
messages from the server.
* Report out-of-disk-space errors properly in `pg_dump` and `pg_basebackup`
* Several fixes for `pg_restore`, including a fix for parallel restore on tables
that have both table-level and column-level privileges.
* Fix for `pg_upgrade` to ensure it runs with `vacuum_defer_cleanup_age` set to
`0`.
* Fix how `pg_rewind` handles just-deleted files in the source data directory
* Fix failu

Re: LSM tree for Postgres

2020-08-08 Thread Konstantin Knizhnik




On 08.08.2020 21:18, Alexander Korotkov wrote:

On Sat, Aug 8, 2020 at 5:07 PM Konstantin Knizhnik
 wrote:

I agree with your that loosing sequential order of B-Tree pages may have
negative impact on performance.
But it first of all critical for order-by and range queries, when we
should traverse several subsequent leave pages.
It is less critical for exact-search or delete/insert operations.
Efficiency of merge operations mostly depends on how much keys
will be stored at the same B-Tree page.

What do you mean by "mostly"?  Given PostgreSQL has quite small (8k)
pages, sequential read in times faster than random read on SSDs
(dozens of times on HDDs).  I don't think this is something to
neglect.


When yo insert one record in B-Tree, the order of pages doesn't matter 
at all.
If you insert ten records at one leaf page then order is also not so 
important.

If you insert 100 records, 50 got to one page and 50 to the next page,
then insertion may be faster if second page follows on  the disk first one.
But such insertion may cause page split and so allocation of new page,
so sequential write order can still be violated.


And it is first of all
determined by size of top index and key distribution.

How can you be sure that the top index can fit memory?  On production
systems, typically there are multiple consumers of memory: other
tables, indexes, other LSMs.  This is one of reasons why LSM
implementations have multiple levels: they don't know in advance which
levels fit memory.  Another reason is dealing with very large
datasets.  And I believe there is a quite strong reason to keep page
order sequential within level.


There is no any warranty that top index is kept in memory.
But as far top index pages are frequently accessed,  I hope that buffer 
management cache replacement

algorithm does it best to keep them in memory.


I'm OK with your design for a third-party extension.  It's very cool
to have.  But I'm -1 for something like this to get into core
PostgreSQL, assuming it's feasible to push some effort and get
state-of-art LSM there.

I realize that it is not true LSM.
But still I wan to notice that it is able to provide ~10 times increase 
of insert speed when size of index is comparable with RAM size.
And "true LSM" from RocksDB shows similar results. May be if size of 
index will be 100 times larger then
size of RAM, RocksDB will be significantly faster than Lsm3. But modern 
servers has 0.5-1Tb of RAM.

Can't believe that there are databases with 100Tb indexes.





Re: LSM tree for Postgres

2020-08-08 Thread Alexander Korotkov
On Sat, Aug 8, 2020 at 11:49 PM Konstantin Knizhnik
 wrote:
> On 08.08.2020 21:18, Alexander Korotkov wrote:
> > On Sat, Aug 8, 2020 at 5:07 PM Konstantin Knizhnik
> >  wrote:
> >> I agree with your that loosing sequential order of B-Tree pages may have
> >> negative impact on performance.
> >> But it first of all critical for order-by and range queries, when we
> >> should traverse several subsequent leave pages.
> >> It is less critical for exact-search or delete/insert operations.
> >> Efficiency of merge operations mostly depends on how much keys
> >> will be stored at the same B-Tree page.
> > What do you mean by "mostly"?  Given PostgreSQL has quite small (8k)
> > pages, sequential read in times faster than random read on SSDs
> > (dozens of times on HDDs).  I don't think this is something to
> > neglect.
>
> When yo insert one record in B-Tree, the order of pages doesn't matter
> at all.
> If you insert ten records at one leaf page then order is also not so
> important.
> If you insert 100 records, 50 got to one page and 50 to the next page,
> then insertion may be faster if second page follows on  the disk first one.
> But such insertion may cause page split and so allocation of new page,
> so sequential write order can still be violated.

Sorry, I've no idea of what you're getting at.

> >> And it is first of all
> >> determined by size of top index and key distribution.
> > How can you be sure that the top index can fit memory?  On production
> > systems, typically there are multiple consumers of memory: other
> > tables, indexes, other LSMs.  This is one of reasons why LSM
> > implementations have multiple levels: they don't know in advance which
> > levels fit memory.  Another reason is dealing with very large
> > datasets.  And I believe there is a quite strong reason to keep page
> > order sequential within level.
>
> There is no any warranty that top index is kept in memory.
> But as far top index pages are frequently accessed,  I hope that buffer
> management cache replacement
> algorithm does it best to keep them in memory.

So, the top index should be small enough that we can safely assume it
wouldn't be evicted from cache on a heavily loaded production system.
I think it's evident that it should be in orders of magnitude less
than the total amount of server RAM.

> > I'm OK with your design for a third-party extension.  It's very cool
> > to have.  But I'm -1 for something like this to get into core
> > PostgreSQL, assuming it's feasible to push some effort and get
> > state-of-art LSM there.
> I realize that it is not true LSM.
> But still I wan to notice that it is able to provide ~10 times increase
> of insert speed when size of index is comparable with RAM size.
> And "true LSM" from RocksDB shows similar results.

It's very far from being shown.  All the things you've shown is a
naive benchmark.  I don't object that your design can work out some
cases.  And it's great that we have the lsm3 extension now.  But I
think for PostgreSQL core we should think about better design.

> May be if size of
> index will be 100 times larger then
> size of RAM, RocksDB will be significantly faster than Lsm3. But modern
> servers has 0.5-1Tb of RAM.
> Can't believe that there are databases with 100Tb indexes.

Comparison of whole RAM size to single index size looks plain wrong
for me.  I think we can roughly compare whole RAM size to whole
database size.  But also not the whole RAM size is always available
for caching data.  Let's assume half of RAM is used for caching data.
So, a modern server with 0.5-1Tb of RAM, which suffers from random
B-tree insertions and badly needs LSM-like data-structure, runs a
database of 25-50Tb.  Frankly speaking, there is nothing
counterintuitive for me.

--
Regards,
Alexander Korotkov




Re: 回复:how to create index concurrently on partitioned table

2020-08-08 Thread Michael Paquier
On Sat, Aug 08, 2020 at 01:37:44AM -0500, Justin Pryzby wrote:
> That gave me the idea to layer CIC on top of Reindex, since I think it does
> exactly what's needed.

For now, I would recommend to focus first on 0001 to add support for
partitioned tables and indexes to REINDEX.  CIC is much more
complicated btw, but I am not entering in the details now.

-   /*
-* This may be useful when implemented someday; but that day is not today.
-* For now, avoid erroring out when called in a multi-table context
-* (REINDEX SCHEMA) and happen to come across a partitioned table.  The
-* partitions may be reindexed on their own anyway.
-*/
+   /* Avoid erroring out */
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
This comment does not help, and actually this becomes incorrect as
reindex for this relkind becomes supported once 0001 is done.

+   case RELKIND_INDEX:
+   reindex_index(inhrelid, false, get_rel_persistence(inhrelid),
+ options | REINDEXOPT_REPORT_PROGRESS);
+   break;
+   case RELKIND_RELATION:
+   (void) reindex_relation(inhrelid,
+ REINDEX_REL_PROCESS_TOAST |
+ REINDEX_REL_CHECK_CONSTRAINTS,
+ options | REINDEXOPT_REPORT_PROGRESS);
ReindexPartitionedRel() fails to consider the concurrent case here for
partition indexes and tables, as reindex_index()/reindex_relation()
are the APIs used in the non-concurrent case.  Once you consider the
concurrent case correctly, we also need to be careful with partitions
that have a temporary persistency (note that we don't allow partition
trees to mix persistency types, all partitions have to be temporary or
permanent).

I think that you are right to make the entry point to handle
partitioned index in ReindexIndex() and partitioned table in
ReindexTable(), but the structure of the patch should be different:
- The second portion of ReindexMultipleTables() should be moved into a
separate routine, taking in input a list of relation OIDs.  This needs
to be extended a bit so as reindex_index() gets called for an index
relkind if the relpersistence is temporary or if we have a
non-concurrent reindex.  The idea is that we finish with a single code
path able to work on a list of relations.  And your patch adds more of
that as of ReindexPartitionedRel().
- We should *not* handle directly partitioned index and/or table in
ReindexRelationConcurrently() to not complicate the logic where we
gather all the indexes of a table/matview.  So I think that the list
of partition indexes/tables to work on should be built directly in
ReindexIndex() and ReindexTable(), and then this should call the
second part of ReindexMultipleTables() refactored in the previous
point.  This way, each partition index gets done individually in its
own transaction.  For a partition table, all indexes of this partition
are rebuilt in the same set of transactions.  For the concurrent case,
we have already reindex_concurrently_swap that it able to switch the
dependencies of two indexes within a partition tree, so we can rely on
that so as a failure in the middle of the operation never leaves the
a partition structure in an inconsistent state.
--
Michael


signature.asc
Description: PGP signature


Re: Allow some recovery parameters to be changed with reload

2020-08-08 Thread Michael Paquier
On Wed, Aug 05, 2020 at 11:41:49AM -0400, Robert Haas wrote:
> On Sat, Mar 28, 2020 at 7:21 AM Sergei Kornilov  wrote:
>> So...
>> We call restore_command only when walreceiver is stopped.
>> We use restore_command only in startup process - so we have no race 
>> condition between processes.
>> We have some issues here? Or we can just make restore_command reloadable as 
>> attached?
> 
> I don't see the problem here, either. Does anyone else see a problem,
> or some reason not to press forward with this?

Sorry for the late reply.  I have been looking at that stuff again,
and restore_command can be called in the context of a WAL sender
process within the page_read callback of logical decoding via
XLogReadDetermineTimeline(), as readTimeLineHistory() could look for a
timeline history file.  So restore_command is not used only in the
startup process.
--
Michael


signature.asc
Description: PGP signature


Re: Unnecessary delay in streaming replication due to replay lag

2020-08-08 Thread Asim Praveen
I would like to revive this thready by submitting a rebased patch to start 
streaming replication without waiting for startup process to finish replaying 
all WAL.  The start LSN for streaming is determined to be the LSN that points 
to the beginning of the most recently flushed WAL segment.

The patch passes tests under src/test/recovery and top level “make check”.



v2-0001-Start-WAL-receiver-before-startup-process-replays.patch
Description:  v2-0001-Start-WAL-receiver-before-startup-process-replays.patch


Re: Amcheck: do rightlink verification with lock coupling

2020-08-08 Thread Andrey M. Borodin



> 8 авг. 2020 г., в 23:14, Peter Geoghegan  написал(а):
> 
> I pushed a cleaned up version of this patch just now. I added some
> commentary about this canonical example in header comments for the new
> function.

Thanks for working on this!

Best regards, Andrey Borodin.