Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-07-28 Thread alvhe...@alvh.no-ip.org
Hello, On 2022-Jun-29, Imseih (AWS), Sami wrote: > > Would you mind trying the second attached to abtain detailed log on > > your testing environment? With the patch, the modified TAP test yields > > the log lines like below. > > Thanks for this. I will apply this to the testing environment and

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-03-23 Thread alvhe...@alvh.no-ip.org
On 2022-Mar-07, Imseih (AWS), Sami wrote: > I have gone ahead and backpatched this all the way to 10 as well. Thanks! I pushed this now. I edited the test though: I don't understand why you went to the trouble of setting stuff in order to call 'pg_ctl promote' (in different ways for older

Re: archive status ".ready" files may be created too early

2021-08-31 Thread alvhe...@alvh.no-ip.org
On 2021-Sep-01, Michael Paquier wrote: > That's about 515e3d8, right? Yes. > I have not looked in details at what you have here, but this produces > a compilation warning on Windows for me with this part of the patch: This seems a tiny speck in a sea of bogosity. If you want to silence the

Re: archive status ".ready" files may be created too early

2021-08-31 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-31, Andres Freund wrote: > Maybe, but this is getting uglier and uglier. > > I think patch should be reverted. It's not in a state that's appropriate for > the backbranches. Yeah, that's becoming my conclusion too -- undo that, and start from scratch using the other idea. --

Re: archive status ".ready" files may be created too early

2021-08-30 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-28, Andres Freund wrote: > While rebasing the aio patchset ontop of HEAD I noticed that this commit added > another atomic operation to XLogWrite() with archiving enabled. The WAL write > path is really quite hot, and at least some of the > NotifySegmentsReadyForArchive() calls are

Re: prevent immature WAL streaming

2021-08-25 Thread alvhe...@alvh.no-ip.org
BTW while going about testing this, I noticed that we forbid pg_walfile_name() while in recovery. That restriction was added by commit 370f770c15a4 because ThisTimeLineID was not set correctly during recovery. That was supposed to be fixed by commit 1148e22a82ed, so I thought that it should be

Re: prevent immature WAL streaming

2021-08-25 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-25, Jakub Wartak wrote: > In order to get reliable reproducer and get proper the fault injection > instead of playing with really filling up fs, apparently one could > substitute fd with fd of /dev/full using e.g. dup2() so that every > write is going to throw this error too: Oh,

Re: prevent immature WAL streaming

2021-08-25 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-24, Bossart, Nathan wrote: > If moving RegisterSegmentBoundary() is sufficient to prevent the flush > pointer from advancing before we register the boundary, I bet we could > also remove the WAL writer nudge. Can you elaborate on this? I'm not sure I see the connection. > Another

Re: prevent immature WAL streaming

2021-08-24 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-24, Bossart, Nathan wrote: > I wonder if we need to move the call to RegisterSegmentBoundary() to > somewhere before WALInsertLockRelease() for this to work as intended. > Right now, boundary registration could take place after the flush > pointer has been advanced, which means

Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote: > On 8/23/21, 12:55 PM, "alvhe...@alvh.no-ip.org" > wrote: > > Thanks. I've pushed this now to all live branches. > > Thank you! I appreciate the thorough reviews. Should we make a new > thread for the streaming replicati

Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote: > Ah, okay. BTW the other changes you mentioned made sense to me. Thanks. I've pushed this now to all live branches. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ It does it in a really, really complicated way why

Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, alvhe...@alvh.no-ip.org wrote: > The only way .ready files are created is that XLogNotifyWrite() is > called. For regular WAL files during regular operation, that only > happens in XLogNotifyWriteSeg(). That, in turn, only happens in > NotifySegmentsRea

Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote: > Sorry, I'm still not following this one. If we skipped creating > .ready segments due to a crash, we rely on RemoveOldXlogFiles() to > create them as needed in the end-of-recovery checkpoint. If a record > fits perfectly in the end of a segment, we'll

Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-23, Bossart, Nathan wrote: > Hm. My expectation would be that if there are no registrations, we > cannot create .ready files for the flushed segments. The scenario > where I can see that happening is when a record gets flushed to disk > prior to registration. In that case, we'll

Re: archive status ".ready" files may be created too early

2021-08-23 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-21, Bossart, Nathan wrote: > > Well, the thing I realized is that these three helper functions have > > exactly one caller each. I think the compiler is going to inline them, > > so there isn't going to be a function call in the assembly. I haven't > > verified this, though. > >

Re: archive status ".ready" files may be created too early

2021-08-20 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-20, Bossart, Nathan wrote: > I was looking at moving the function calls out of the spinlock region. > I don't think the functions are doing anything too expensive, and they > help clean up NotifySegmentsReadyForArchive() quite a bit, but I > understand why it might be against project

Re: archive status ".ready" files may be created too early

2021-08-20 Thread alvhe...@alvh.no-ip.org
Attached is v14 which uses a separate spinlock. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "No me acuerdo, pero no es cierto. No es cierto, y si fuera cierto, no me acuerdo." (Augusto Pinochet a una corte de justicia) >From

Re: archive status ".ready" files may be created too early

2021-08-20 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-20, Bossart, Nathan wrote: > > On Fri, Aug 20, 2021 at 1:29 PM Bossart, Nathan wrote: > >> This led me to revisit the two-element > >> approach that was discussed upthread. What if we only stored the > >> earliest and latest segment boundaries at any given time? Once the > >>

Re: archive status ".ready" files may be created too early

2021-08-20 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-20, Bossart, Nathan wrote: > On 8/20/21, 8:29 AM, "Robert Haas" wrote: > > We can't expand the hash table either. It has an initial and maximum > > size of 16 elements, which means it's basically an expensive array, > > and which also means that it imposes a new limit of 16 * > >

Re: archive status ".ready" files may be created too early

2021-08-20 Thread alvhe...@alvh.no-ip.org
Two things. 1. We use a hash table in shared memory. That's great. The part that's not so great is that in both places where we read items from it, we have to iterate in some way. This seems a bit silly. An array would serve us better, if only we could expand it as needed. However,

Re: archive status ".ready" files may be created too early

2021-08-20 Thread alvhe...@alvh.no-ip.org
In v12 I moved the code around a bit and reworded some comments. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ >From 7d1475578431e265a5e7f8b94a6b0791b68a9a31 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 17 Aug 2021 03:52:43 + Subject: [PATCH

Re: archive status ".ready" files may be created too early

2021-08-18 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-18, Bossart, Nathan wrote: > As long as XLogBackgroundFlush() found work to do, it would take care > of notifying, but I don't think we can depend on that. However, since > we're primarily using the WAL writer to take care of the case when the > record has already been flushed,

Re: archive status ".ready" files may be created too early

2021-08-18 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-18, Bossart, Nathan wrote: > I'll add it after XLogBackgroundFlush(). I was wondering which would be better: before or after. XLogBackgroundFlush would do it anyway, so if you do it after then it's not clear to me that it'd do anything (I mean we should not do any new calls of

Re: archive status ".ready" files may be created too early

2021-08-18 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-18, alvhe...@alvh.no-ip.org wrote: > I realize this means there's a contradiction with my previous argument, > in that synchronous transaction commit calls XLogWrite at some point, so > we *are* putting the client-connected backend in charge of creating the > notify fil

Re: archive status ".ready" files may be created too early

2021-08-18 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-18, Bossart, Nathan wrote: > On 8/18/21, 10:06 AM, "alvhe...@alvh.no-ip.org" > wrote: > > So that comment suggests that we should give the responsibility to bgwriter. > > This seems good enough to me. I suppose if bgwriter has a long run of > &

Re: archive status ".ready" files may be created too early

2021-08-18 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-17, alvhe...@alvh.no-ip.org wrote: > However, why do it in a WAL-producing client-connected backend? It > strikes me as a bad thing to do, because you are possibly causing delays > for client-connected backends. I suggest that we should give this task > to the WAL wr

Re: archive status ".ready" files may be created too early

2021-08-17 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-17, Bossart, Nathan wrote: > On 8/17/21, 2:13 PM, "alvhe...@alvh.no-ip.org" > wrote: > > > So, why isn't it that we call Register in XLogInsertRecord, and > > Notify in XLogWrite? > > We do. However, we also call NotifySegmentsReadyForArchiv

Re: archive status ".ready" files may be created too early

2021-08-17 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-17, Bossart, Nathan wrote: > The main reason for registering the boundaries in XLogInsertRecord() > is that it has the required information about the WAL record > boundaries. I do not think that XLogWrite() has this information. Doh, of course. So, why isn't it that we call

Re: archive status ".ready" files may be created too early

2021-08-17 Thread alvhe...@alvh.no-ip.org
The thing I still don't understand about this patch is why we call RegisterSegmentBoundaryEntry and NotifySegmentsReadyForArchive in XLogInsertRecord. My model concept of this would have these routines called only in XLogFlush / XLogWrite, which are conceptually about transferring data from

Re: archive status ".ready" files may be created too early

2021-08-17 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-17, Bossart, Nathan wrote: > I think we are in agreement. If we assume that the flush pointer > jumps along record boundaries and segment boundaries, the solution > would be to avoid using the flush pointer when it points to a segment > boundary (given that the segment boundary is

Re: archive status ".ready" files may be created too early

2021-08-17 Thread alvhe...@alvh.no-ip.org
On 2021-Aug-17, Bossart, Nathan wrote: > On 8/16/21, 5:09 PM, "alvhe...@alvh.no-ip.org" > wrote: > > The reason for the latter is that I suspect the segment boundary map > > will also have a use to fix the streaming replication issue I mentioned > > earlier

Re: archive status ".ready" files may be created too early

2021-08-16 Thread alvhe...@alvh.no-ip.org
So why do we call this structure "record boundary map", when the boundaries it refers to are segment boundaries? I think we should call it "segment boundary map" instead ... and also I think we should use that name in the lock that protects it, so instead of ArchNotifyLock, it could be

Re: Preventing abort() and exit() calls in libpq

2021-07-02 Thread alvhe...@alvh.no-ip.org
On 2021-Jul-02, Jacob Champion wrote: > Only src/common: > > controldata_utils_shlib.o: > U close > U __errno_location > U exit Actually, I do see these in the .o files as well, but they don't make it to the .a file. gcc here is 8.3.0. --

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Apr-01, 'alvhe...@alvh.no-ip.org' wrote: > Ooh, wow ... now that is a silly bug! Thanks, I'll push the fix in a > minute. It still didn't fix it! Drongo is now reporting a difference in the expected trace -- and the differences all seem to be message lengths. Now that is

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Apr-01, Tom Lane wrote: > So drongo is still failing, and after a bit of looking around at > other code I finally got hit with the clue hammer. Per port.h: > > * On Windows, setvbuf() does not support _IOLBF mode, and interprets that > * as _IOFBF. To add insult to injury,

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Apr-01, Tom Lane wrote: > "'alvhe...@alvh.no-ip.org'" writes: > > On 2021-Apr-01, Tom Lane wrote: > >> BTW, what in the world is this supposed to accomplish? > >> -(long long) rows_to_send); > >> +

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Apr-01, Tom Lane wrote: > BTW, what in the world is this supposed to accomplish? > > -(long long) rows_to_send); > +(1L << 62) + (long long) rows_to_send); > > Various buildfarm members are complaining that the shift distance > is more

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-31, Tom Lane wrote: > I wrote: > > That is weird - only test 4 (of 8) runs at all, the rest seem to > > fail to connect. What's different about pipelined_insert? > > Oh ... there's a pretty obvious theory. pipelined_insert is > the only one that is not asked to write a trace file.

Re: libpq debug log

2021-03-31 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-31, Tom Lane wrote: > While this may have little to do with drongo's issue, > I'm going to take exception to this bit that I see that > the patch added to PQtrace(): > > /* Make the trace stream line-buffered */ > setvbuf(debug_port, NULL, _IOLBF, 0); Mea culpa. I added

Re: libpq debug log

2021-03-31 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-31, 'alvhe...@alvh.no-ip.org' wrote: > .. oh, I think we forgot to set conn->Pfdebug = NULL when creating the > connection. So when we do PQtrace(), the first thing it does is > PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash. > So this sh

Re: libpq debug log

2021-03-31 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-31, Tom Lane wrote: > I wrote: > > That is weird - only test 4 (of 8) runs at all, the rest seem to > > fail to connect. What's different about pipelined_insert? > > Oh ... there's a pretty obvious theory. pipelined_insert is > the only one that is not asked to write a trace file.

Re: libpq debug log

2021-03-31 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-31, Tom Lane wrote: > I think this is a timing problem that's triggered (on some machines) > by force_parallel_mode = regress. Looking at spurfowl's latest > failure of this type, the postmaster log shows > > 2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG: execute

Re: libpq debug log

2021-03-30 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-30, 'alvhe...@alvh.no-ip.org' wrote: > got expected ERROR: cannot insert multiple commands into a prepared statement > got expected division-by-zero Eh? this is not at all expected, of course, but clearly not PQtrace's fault. I'll look tomorrow. -- Álvaro H

Re: libpq debug log

2021-03-30 Thread 'alvhe...@alvh.no-ip.org'
So crake failed. The failure is that it doesn't print the DataRow messages. That's quite odd. We see this in the trace log: Mar 30 20:05:15 # F 54 Parse"" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0 Mar 30 20:05:15 # F 12 Bind "" "" 0 0 0 Mar 30 20:05:15 # F

Re: libpq debug log

2021-03-30 Thread 'alvhe...@alvh.no-ip.org'
Okay, pushed this patch and the new testing for it based on libpq_pipeline. We'll see how the buildfarm likes it. I made some further changes to the last version; user-visibly, I split the trace flags in two, keeping the timestamp suppression separate from the redacting feature for regression

Re: libpq debug log

2021-03-28 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-29, tsunakawa.ta...@fujitsu.com wrote: > > (Hey, what the heck is that "Z" at the end of the message? I thought > > the error ended right at the \x00 ...) > > We'll investigate these issues. For what it's worth, I did fix this problem in patch 0005 that I attached. The problem was

Re: libpq debug log

2021-03-27 Thread alvhe...@alvh.no-ip.org
On 2021-Mar-27, alvhe...@alvh.no-ip.org wrote: > This last one uses libpq_pipeline -t and verifies the output against an > expected trace file. Applies on top of all the previous patches. I > attach the whole lot, so that the CF bot has a chance to run it. All tests pass, but C

Re: libpq debug log

2021-03-27 Thread alvhe...@alvh.no-ip.org
On 2021-Mar-26, alvhe...@alvh.no-ip.org wrote: > Proposed changes on top of v29. This last one uses libpq_pipeline -t and verifies the output against an expected trace file. Applies on top of all the previous patches. I attach the whole lot, so that the CF bot has a chance to run it. I

Re: libpq debug log

2021-03-26 Thread alvhe...@alvh.no-ip.org
Proposed changes on top of v29. -- Álvaro Herrera Valdivia, Chile >From b32ae3805bb28553c0a1cf308c6ed27f58576f3c Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 26 Mar 2021 19:12:12 -0300 Subject: [PATCH 1/5] libpq_pipeline: add -t support for PQtrace ---

Re: libpq debug log

2021-03-26 Thread alvhe...@alvh.no-ip.org
Hello I added an option to the new libpq_pipeline program that it activates libpq trace. It works nicely and I think we can add that to the regression tests. However I have two observations. 1. The trace output for the error message won't be very nice, because it prints line numbers. So if I

Re: libpq debug log

2021-03-10 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-10, Tom Lane wrote: > Or we could rethink the logic. It's not quite clear to me, after > all this time, why getRowDescriptions() et al are allowed to > move inStart themselves rather than letting the main loop in > pqParseInput3 do it. It might well be an artifact of having not >

Re: libpq debug log

2021-03-10 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-10, Tom Lane wrote: > "'alvhe...@alvh.no-ip.org'" writes: > > After staring at it a couple of times, I think that the places in > > pqParseInput3() where there's a comment "... moves inStart itself" and > > then "continue;" sh

Re: libpq debug log

2021-03-10 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-10, 'alvhe...@alvh.no-ip.org' wrote: > I also found that DataRow messages are not being printed. This seems to > fix that in the normal case and singlerow, at least in pipeline mode. > Didn't verify the non-pipeline mode. Hm, and RowDescription ('T') messages are also

Re: libpq debug log

2021-03-10 Thread 'alvhe...@alvh.no-ip.org'
I also found that DataRow messages are not being printed. This seems to fix that in the normal case and singlerow, at least in pipeline mode. Didn't verify the non-pipeline mode. -- Álvaro Herrera39°49'30"S 73°17'W "Nunca se desea ardientemente lo que solo se desea

Re: libpq debug log

2021-03-10 Thread 'alvhe...@alvh.no-ip.org'
On 2021-Mar-10, iwata@fujitsu.com wrote: > Hi all, > > Following all reviewer's advice, I have created a new patch. > > In this patch, I add only two tracing entry points; I call > pqTraceOutputMsg(PGconn conn, int msgCursor, PGCommSource commsource) in > pqParseInput3 () and pqPutMsgEnd

Re: libpq debug log

2021-03-05 Thread alvhe...@alvh.no-ip.org
On 2021-Mar-05, tsunakawa.ta...@fujitsu.com wrote: > From: Tom Lane > > But I think passing the message start address explicitly might be > > better than having it understand the buffering behavior in enough > > detail to know where to find the message. Part of the point here > > (IMO) is to

Re: libpq debug log

2021-02-23 Thread alvhe...@alvh.no-ip.org
I'll give this another look tomorrow, but I wanted to pass along that I prefer libpq-trace.{c,h} instead of libpq-logging. I also renamed variable "pin" and pgindented. I don't have any major reservations about this patch now, so I'll mark it ready-for-committer in case somebody else wants to