Re: finding changed blocks using WAL scanning

2019-04-24 Thread Robert Haas
On Wed, Apr 24, 2019 at 10:10 AM Tomas Vondra wrote: > >I'm still interested in the answer to this question, but I don't see a > >reply that specifically concerns it. Apologies if I have missed one. > > I don't think prefetching WAL blocks is all that important. The WAL > segment was probably

Re: finding changed blocks using WAL scanning

2019-04-24 Thread Tomas Vondra
On Wed, Apr 24, 2019 at 09:25:12AM -0400, Robert Haas wrote: On Mon, Apr 22, 2019 at 9:51 PM Robert Haas wrote: For this particular use case, wouldn't you want to read the WAL itself and use that to issue prefetch requests? Because if you use the .modblock files, the data file blocks will end

Re: finding changed blocks using WAL scanning

2019-04-24 Thread Robert Haas
On Mon, Apr 22, 2019 at 9:51 PM Robert Haas wrote: > For this particular use case, wouldn't you want to read the WAL itself > and use that to issue prefetch requests? Because if you use the > .modblock files, the data file blocks will end up in memory but the > WAL blocks won't, and you'll still

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Tomas Vondra
On Tue, Apr 23, 2019 at 10:09:39AM -0700, Andres Freund wrote: Hi, On 2019-04-23 19:01:29 +0200, Tomas Vondra wrote: On Tue, Apr 23, 2019 at 09:34:54AM -0700, Andres Freund wrote: > Hi, > > On 2019-04-23 18:07:40 +0200, Tomas Vondra wrote: > > Well, the thing is that for prefetching to be

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Andres Freund
Hi, On 2019-04-23 19:01:29 +0200, Tomas Vondra wrote: > On Tue, Apr 23, 2019 at 09:34:54AM -0700, Andres Freund wrote: > > Hi, > > > > On 2019-04-23 18:07:40 +0200, Tomas Vondra wrote: > > > Well, the thing is that for prefetching to be possible you actually have > > > to be a bit behind.

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Tomas Vondra
On Tue, Apr 23, 2019 at 09:34:54AM -0700, Andres Freund wrote: Hi, On 2019-04-23 18:07:40 +0200, Tomas Vondra wrote: Well, the thing is that for prefetching to be possible you actually have to be a bit behind. Otherwise you can't really look forward which blocks will be needed, right? IMHO

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Andres Freund
Hi, On 2019-04-23 18:07:40 +0200, Tomas Vondra wrote: > Well, the thing is that for prefetching to be possible you actually have > to be a bit behind. Otherwise you can't really look forward which blocks > will be needed, right? > > IMHO the main use case for prefetching is when there's a spike

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Tomas Vondra
On Tue, Apr 23, 2019 at 11:43:05AM -0400, Stephen Frost wrote: Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On Tue, Apr 23, 2019 at 10:22:46AM -0400, Stephen Frost wrote: >* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: >>On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On Tue, Apr 23, 2019 at 10:22:46AM -0400, Stephen Frost wrote: > >* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > >>On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: > >>>On Sat, Apr 20, 2019 at 12:42 AM Stephen

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Tomas Vondra
On Tue, Apr 23, 2019 at 10:22:46AM -0400, Stephen Frost wrote: Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: >On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost wrote: >>> Oh. Well, I already explained my algorithm

Re: finding changed blocks using WAL scanning

2019-04-23 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: > >On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost wrote: > >>> Oh. Well, I already explained my algorithm for doing that upthread, > >>> which I believe would be quite

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Robert Haas
On Mon, Apr 22, 2019 at 7:04 PM Tomas Vondra wrote: > Some time ago there was a discussion about prefetching blocks during > recovery on a standby, and that's a great example of a use case that > benefit from this - look which blocks where modified in the next chunk > of WAL, prefetch them. But

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Mon, Apr 22, 2019 at 08:52:11PM -0400, Bruce Momjian wrote: > Well, the interesting question is whether the server will generate a > single modblock file for all WAL in pg_wal only right before we are > ready to expire some WAL, or whether modblock files will be generated > offline, perhaps

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Tue, Apr 23, 2019 at 02:13:29AM +0200, Tomas Vondra wrote: > Well, I understand that concern - all I'm saying is that makes this > useless for some use cases (that may or may not be important enough). > > However, it seems to me those files are guaranteed to be much smaller > than the WAL

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Tomas Vondra
On Mon, Apr 22, 2019 at 07:44:45PM -0400, Bruce Momjian wrote: On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote: On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost wrote: > > > Oh. Well, I already explained my algorithm

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote: > On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: > > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost wrote: > > > > Oh. Well, I already explained my algorithm for doing that upthread, > > > > which I believe would be

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Tomas Vondra
On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost wrote: > Oh. Well, I already explained my algorithm for doing that upthread, > which I believe would be quite cheap. > > 1. When you generate the .modblock files, stick all the block >

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Tomas Vondra
On Mon, Apr 22, 2019 at 01:15:49PM -0400, Bruce Momjian wrote: On Mon, Apr 22, 2019 at 01:11:22PM -0400, Robert Haas wrote: On Mon, Apr 22, 2019 at 12:35 PM Bruce Momjian wrote: > I assumed the modblock files would be stored in the WAL archive so some > external tools could generate

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Tomas Vondra
On Thu, Apr 18, 2019 at 04:25:24PM -0400, Robert Haas wrote: On Thu, Apr 18, 2019 at 3:51 PM Bruce Momjian wrote: How would you choose the STARTLSN/ENDLSN? If you could do it per checkpoint, rather than per-WAL, I think that would be great. I thought of that too. It seems appealing,

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Mon, Apr 22, 2019 at 01:11:22PM -0400, Robert Haas wrote: > On Mon, Apr 22, 2019 at 12:35 PM Bruce Momjian wrote: > > I assumed the modblock files would be stored in the WAL archive so some > > external tools could generate incremental backups using just the WAL > > files. I assumed they

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Robert Haas
On Mon, Apr 22, 2019 at 12:35 PM Bruce Momjian wrote: > I assumed the modblock files would be stored in the WAL archive so some > external tools could generate incremental backups using just the WAL > files. I assumed they would also be sent to standby servers so > incremental backups could be

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Mon, Apr 22, 2019 at 12:15:32PM -0400, Robert Haas wrote: > On Mon, Apr 22, 2019 at 11:48 AM Bruce Momjian wrote: > > My point is that you would normally only remove the modblock file when 4 > > is removed because this modblock files is useful for incremental backups > > from base backups that

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Robert Haas
On Mon, Apr 22, 2019 at 11:48 AM Bruce Momjian wrote: > My point is that most WAL archive tools will order and remove files > based on their lexical ordering, so if you put the start first, the file > will normally be removed when it should be kept, e.g., if you have WAL > files like: > >

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Robert Haas
On Sun, Apr 21, 2019 at 10:21 PM Michael Paquier wrote: > If you create the extra file when a segment is finished and we switch > to a new one, then the extra work would happen for a random backend, > and it is going to be more costly to scan a 1GB segment than a 16MB > segment as a one-time

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Mon, Apr 22, 2019 at 11:20:43AM +0900, Michael Paquier wrote: > On Sat, Apr 20, 2019 at 12:21:36AM -0400, Robert Haas wrote: > > The segment size doesn't have much to do with it. If you make > > segments bigger, you'll have to scan fewer larger ones; if you make > > them smaller, you'll have

Re: finding changed blocks using WAL scanning

2019-04-22 Thread Bruce Momjian
On Sun, Apr 21, 2019 at 06:24:50PM -0400, Robert Haas wrote: > On Sat, Apr 20, 2019 at 5:54 PM Bruce Momjian wrote: > > Good point. You mentioned: > > > > It seems better to me to give the files names like > > ${TLI}.${STARTLSN}.${ENDLSN}.modblock, e.g. > >

Re: finding changed blocks using WAL scanning

2019-04-21 Thread Michael Paquier
On Sat, Apr 20, 2019 at 12:21:36AM -0400, Robert Haas wrote: > The segment size doesn't have much to do with it. If you make > segments bigger, you'll have to scan fewer larger ones; if you make > them smaller, you'll have more smaller ones. The only thing that > really matters is the amount of

Re: finding changed blocks using WAL scanning

2019-04-21 Thread Robert Haas
On Sat, Apr 20, 2019 at 5:54 PM Bruce Momjian wrote: > Good point. You mentioned: > > It seems better to me to give the files names like > ${TLI}.${STARTLSN}.${ENDLSN}.modblock, e.g. > 0001.00016858.0001687DBBB8.modblock, so that you can > see

Re: finding changed blocks using WAL scanning

2019-04-20 Thread Bruce Momjian
On Sat, Apr 20, 2019 at 04:17:08PM -0400, Robert Haas wrote: > On Sat, Apr 20, 2019 at 9:18 AM Bruce Momjian wrote: > > > I think you've got to prevent the WAL from being removed until a > > > .modblock file has been written. In more detail, you should (a) scan > > > all the WAL segments that

Re: finding changed blocks using WAL scanning

2019-04-20 Thread Robert Haas
On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost wrote: > > Oh. Well, I already explained my algorithm for doing that upthread, > > which I believe would be quite cheap. > > > > 1. When you generate the .modblock files, stick all the block > > references into a buffer. qsort(). Dedup. Write out

Re: finding changed blocks using WAL scanning

2019-04-20 Thread Robert Haas
On Sat, Apr 20, 2019 at 9:18 AM Bruce Momjian wrote: > > I think you've got to prevent the WAL from being removed until a > > .modblock file has been written. In more detail, you should (a) scan > > all the WAL segments that will be summarized in the .modblock file, > > (b) write the file under

Re: finding changed blocks using WAL scanning

2019-04-20 Thread Bruce Momjian
On Sat, Apr 20, 2019 at 12:09:42AM -0400, Robert Haas wrote: > On Thu, Apr 18, 2019 at 5:47 PM Bruce Momjian wrote: > > How would the modblock file record all the modified blocks across > > restarts and crashes? I assume that 1G of WAL would not be available > > for scanning. I suppose that

Re: finding changed blocks using WAL scanning

2019-04-20 Thread Bruce Momjian
On Sat, Apr 20, 2019 at 12:21:36AM -0400, Robert Haas wrote: > As to that, what I'm proposing here is no different than what we are > already doing with physical and logical replication, except that it's > probably a bit cheaper. Physical replication reads all the WAL and > sends it all out over

Re: finding changed blocks using WAL scanning

2019-04-19 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Apr 19, 2019 at 8:39 PM Stephen Frost wrote: > > While I do think we should at least be thinking about the load caused > > from scanning the WAL to generate a list of blocks that are changed, the > > load I was more concerned with

Re: finding changed blocks using WAL scanning

2019-04-19 Thread Robert Haas
On Thu, Apr 18, 2019 at 8:38 PM Michael Paquier wrote: > On Thu, Apr 18, 2019 at 03:51:14PM -0400, Robert Haas wrote: > > I was thinking that a dedicated background worker would be a good > > option, but Stephen Frost seems concerned (over on the other thread) > > about how much load that would

Re: finding changed blocks using WAL scanning

2019-04-19 Thread Robert Haas
On Fri, Apr 19, 2019 at 8:39 PM Stephen Frost wrote: > While I do think we should at least be thinking about the load caused > from scanning the WAL to generate a list of blocks that are changed, the > load I was more concerned with in the other thread is the effort > required to actually merge

Re: finding changed blocks using WAL scanning

2019-04-19 Thread Robert Haas
On Thu, Apr 18, 2019 at 5:47 PM Bruce Momjian wrote: > How would the modblock file record all the modified blocks across > restarts and crashes? I assume that 1G of WAL would not be available > for scanning. I suppose that writing a modblock file to some PGDATA > location when WAL is removed

Re: finding changed blocks using WAL scanning

2019-04-19 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Apr 15, 2019 at 11:45 PM Michael Paquier wrote: > > Any caller of XLogWrite() could switch to a new segment once the > > current one is done, and I am not sure that we would want some random > > backend to potentially slow down to

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Michael Paquier
On Thu, Apr 18, 2019 at 03:51:14PM -0400, Robert Haas wrote: > I was thinking that a dedicated background worker would be a good > option, but Stephen Frost seems concerned (over on the other thread) > about how much load that would generate. That never really occurred > to me as a serious issue

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Andres Freund
On 2019-04-18 17:47:56 -0400, Bruce Momjian wrote: > I can see a 1GB marker being used for that. It would prevent an > incremental backup from being done until the first 1G modblock files was > written, since until then there is no record of modified blocks, but > that seems fine. A 1G marker

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Bruce Momjian
On Thu, Apr 18, 2019 at 04:25:24PM -0400, Robert Haas wrote: > On Thu, Apr 18, 2019 at 3:51 PM Bruce Momjian wrote: > > How would you choose the STARTLSN/ENDLSN? If you could do it per > > checkpoint, rather than per-WAL, I think that would be great. > > I thought of that too. It seems

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Robert Haas
On Thu, Apr 18, 2019 at 3:51 PM Bruce Momjian wrote: > How would you choose the STARTLSN/ENDLSN? If you could do it per > checkpoint, rather than per-WAL, I think that would be great. I thought of that too. It seems appealing, because you probably only really care whether a particular block

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Bruce Momjian
On Thu, Apr 18, 2019 at 03:43:30PM -0400, Robert Haas wrote: > You can make it kinda make sense by saying "the blocks modified by > records *beginning in* segment XYZ" or alternatively "the blocks > modified by records *ending in* segment XYZ", but that seems confusing > to me. For example,

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Robert Haas
On Mon, Apr 15, 2019 at 11:45 PM Michael Paquier wrote: > On Mon, Apr 15, 2019 at 09:04:13PM -0400, Robert Haas wrote: > > That is pretty much exactly what I was intending to propose. > > Any caller of XLogWrite() could switch to a new segment once the > current one is done, and I am not sure

Re: finding changed blocks using WAL scanning

2019-04-18 Thread Robert Haas
On Mon, Apr 15, 2019 at 10:22 PM Bruce Momjian wrote: > > > I am thinking tools could retain modblock files along with WAL, could > > > pull full-page-writes from WAL, or from PGDATA. It avoids the need to > > > scan 16MB WAL files, and the WAL files and modblock files could be > > > expired

Re: finding changed blocks using WAL scanning

2019-04-15 Thread Michael Paquier
On Mon, Apr 15, 2019 at 09:04:13PM -0400, Robert Haas wrote: > That is pretty much exactly what I was intending to propose. Any caller of XLogWrite() could switch to a new segment once the current one is done, and I am not sure that we would want some random backend to potentially slow down to do

Re: finding changed blocks using WAL scanning

2019-04-15 Thread Bruce Momjian
On Mon, Apr 15, 2019 at 09:04:13PM -0400, Robert Haas wrote: > On Mon, Apr 15, 2019 at 4:31 PM Bruce Momjian wrote: > > Can I throw out a simple idea? What if, when we finish writing a WAL > > file, we create a new file 00010001.modblock which > > lists all the heap/index files

Re: finding changed blocks using WAL scanning

2019-04-15 Thread Robert Haas
On Mon, Apr 15, 2019 at 4:31 PM Bruce Momjian wrote: > Can I throw out a simple idea? What if, when we finish writing a WAL > file, we create a new file 00010001.modblock which > lists all the heap/index files and block numbers modified in that WAL > file? How much does that

Re: finding changed blocks using WAL scanning

2019-04-15 Thread Bruce Momjian
On Wed, Apr 10, 2019 at 08:11:11PM -0400, Robert Haas wrote: > On Wed, Apr 10, 2019 at 5:49 PM Robert Haas wrote: > > There is one thing that does worry me about the file-per-LSN-range > > approach, and that is memory consumption when trying to consume the > > information. Suppose you have a

Re: finding changed blocks using WAL scanning

2019-04-11 Thread Kyotaro HORIGUCHI
At Thu, 11 Apr 2019 10:00:35 -0700, Ashwin Agrawal wrote in > On Thu, Apr 11, 2019 at 6:27 AM Robert Haas wrote: > > > On Thu, Apr 11, 2019 at 3:52 AM Peter Eisentraut > > wrote: > > > I had in mind that you could have different overlapping incremental > > > backup jobs in existence at the

Re: finding changed blocks using WAL scanning

2019-04-11 Thread Ashwin Agrawal
On Thu, Apr 11, 2019 at 6:27 AM Robert Haas wrote: > On Thu, Apr 11, 2019 at 3:52 AM Peter Eisentraut > wrote: > > I had in mind that you could have different overlapping incremental > > backup jobs in existence at the same time. Maybe a daily one to a > > nearby disk and a weekly one to a

Re: finding changed blocks using WAL scanning

2019-04-11 Thread Ashwin Agrawal
On Wed, Apr 10, 2019 at 2:50 PM Robert Haas wrote: > Over at > https://www.postgresql.org/message-id/CA%2BTgmobFVe4J4AA7z9OMUzKnm09Tt%2BsybhxeL_Ddst3q3wqpzQ%40mail.gmail.com > I mentioned parsing the WAL to extract block references so that > incremental backup could efficiently determine which

Re: finding changed blocks using WAL scanning

2019-04-11 Thread Robert Haas
On Thu, Apr 11, 2019 at 3:52 AM Peter Eisentraut wrote: > I had in mind that you could have different overlapping incremental > backup jobs in existence at the same time. Maybe a daily one to a > nearby disk and a weekly one to a faraway cloud. Each one of these > would need a separate

Re: finding changed blocks using WAL scanning

2019-04-11 Thread Peter Eisentraut
On 2019-04-10 23:49, Robert Haas wrote: > It seems to me that there are basically two ways of storing this kind > of information, plus a bunch of variants. One way is to store files > that cover a range of LSNs, and basically contain a synopsis of the > WAL for those LSNs. You omit all the

Re: finding changed blocks using WAL scanning

2019-04-10 Thread Robert Haas
On Wed, Apr 10, 2019 at 5:49 PM Robert Haas wrote: > There is one thing that does worry me about the file-per-LSN-range > approach, and that is memory consumption when trying to consume the > information. Suppose you have a really high velocity system. I don't > know exactly what the busiest

finding changed blocks using WAL scanning

2019-04-10 Thread Robert Haas
Over at https://www.postgresql.org/message-id/CA%2BTgmobFVe4J4AA7z9OMUzKnm09Tt%2BsybhxeL_Ddst3q3wqpzQ%40mail.gmail.com I mentioned parsing the WAL to extract block references so that incremental backup could efficiently determine which blocks needed to be copied. Ashwin replied in