Dear hackers,
>
> Thanks to both of you. I have pushed the patch.
>
I have been tracking the BF animals these days, and this failure has not seen
anymore.
I think we can close the topic. Again, thanks for all efforts.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Thu, Jan 11, 2024 at 8:15 AM Hayato Kuroda (Fujitsu)
wrote:
>
> > > But tomorrow it could be for other tables and if we change this
> > > TRUNCATE logic for pg_largeobject (of which chances are less) then
> > > there is always a chance that one misses changing this comment. I feel
> > > keeping
Dear Alexander, Amit,
> > But tomorrow it could be for other tables and if we change this
> > TRUNCATE logic for pg_largeobject (of which chances are less) then
> > there is always a chance that one misses changing this comment. I feel
> > keeping it generic in this case would be better as the pro
10.01.2024 13:37, Amit Kapila wrote:
But tomorrow it could be for other tables and if we change this
TRUNCATE logic for pg_largeobject (of which chances are less) then
there is always a chance that one misses changing this comment. I feel
keeping it generic in this case would be better as the pro
On Wed, Jan 10, 2024 at 3:30 PM Alexander Lakhin wrote:
>
> 10.01.2024 12:31, Amit Kapila wrote:
> > I am slightly hesitant to add any particular system table name in the
> > comments as this can happen for any other system table as well, so
> > slightly adjusted the comments in the attached. Howe
10.01.2024 12:31, Amit Kapila wrote:
I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in
On Tue, Jan 9, 2024 at 4:30 PM Alexander Lakhin wrote:
>
> 09.01.2024 13:08, Amit Kapila wrote:
> >
> >> As to checkpoint_timeout, personally I would not increase it, because it
> >> seems unbelievable to me that pg_restore (with the cluster containing only
> >> two empty databases) can run for lo
Hello Amit,
09.01.2024 13:08, Amit Kapila wrote:
As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation sep
On Tue, Jan 9, 2024 at 2:30 PM Alexander Lakhin wrote:
>
> 09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote:
> > Based on the suggestion by Amit, I have created a patch with the alternative
> > approach. This just does GUC settings. The reported failure is only for
> > 003_logical_slots, but the pa
Hello Kuroda-san,
09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote:
Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
te
Dear Amit, Alexander,
> > We get the effect discussed when the background writer process decides to
> > flush a file buffer for pg_largeobject during stage 1.
> > (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
> > the result must be the same.)
> > And another important fa
On Mon, Jan 8, 2024 at 9:36 PM Jim Nasby wrote:
>
> On 1/4/24 10:19 PM, Amit Kapila wrote:
> > On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin wrote:
> >>
> >> 03.01.2024 14:42, Amit Kapila wrote:
> >>>
> >>
> And the internal process is ... background writer (BgBufferSync()).
>
>
On 1/4/24 10:19 PM, Amit Kapila wrote:
On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin wrote:
03.01.2024 14:42, Amit Kapila wrote:
And the internal process is ... background writer (BgBufferSync()).
So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 test
On Sun, 29 Oct 2023 at 11:14, Hayato Kuroda (Fujitsu)
wrote:
>
> Dear Andres,
>
> While tracking BF failures related with pg_ugprade, I found the same failure
> has still happened [1] - [4].
> According to the log, the output directory was remained even after the
> successful upgrade [5].
> I an
On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin wrote:
>
> 03.01.2024 14:42, Amit Kapila wrote:
> >
>
> >> And the internal process is ... background writer (BgBufferSync()).
> >>
> >> So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
> >> got 20 x 10 tests passing.
> >>
> >
Hello Amit,
03.01.2024 14:42, Amit Kapila wrote:
So I started to think about other approach: to perform unlink as it's
implemented now, but then wait until the DELETE_PENDING state is gone.
There is a comment in the code which suggests we shouldn't wait
indefinitely. See "However, we won't w
On Tue, Jan 2, 2024 at 10:30 AM Alexander Lakhin wrote:
>
> 28.12.2023 06:08, Hayato Kuroda (Fujitsu) wrote:
> > Dear Alexander,
> >
> >> I agree with your analysis and would like to propose a PoC fix (see
> >> attached). With this patch applied, 20 iterations succeeded for me.
> > There are no re
Hello Kuroda-san,
28.12.2023 06:08, Hayato Kuroda (Fujitsu) wrote:
Dear Alexander,
I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.
There are no reviewers so that I will review again. Let's move the PoC
to
Dear Alexander,
> I agree with your analysis and would like to propose a PoC fix (see
> attached). With this patch applied, 20 iterations succeeded for me.
There are no reviewers so that I will review again. Let's move the PoC
to the concrete patch. Note that I only focused on fixes of random fa
Hello Andrew and Kuroda-san,
27.11.2023 16:58, Andrew Dunstan wrote:
It's also interesting, what is full version/build of OS on drongo and
fairywren.
It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122
I've updated it to 17763.5122 now.
Thank you for the information!
Dear Alexander, Andrew,
Thanks for your analysis!
> I see that behavior on:
> Windows 10 Version 1607 (OS Build 14393.0)
> Windows Server 2016 Version 1607 (OS Build 14393.0)
> Windows Server 2019 Version 1809 (OS Build 17763.1)
>
> But it's not reproduced on:
> Windows 10 Version 1809 (OS Buil
On 2023-11-27 Mo 07:39, Andrew Dunstan wrote:
On 2023-11-27 Mo 07:00, Alexander Lakhin wrote:
Hello Kuroda-san,
25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:
Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on
On 2023-11-27 Mo 07:00, Alexander Lakhin wrote:
Hello Kuroda-san,
25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:
Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows
server , right?
I see that behav
Hello Kuroda-san,
25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:
Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows server
, right?
I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Dear Alexander,
>
> Please look at the simple test program attached. It demonstrates the
> failure for me when running in two sessions as follows:
> unlink-open test 150 1000
> unlink-open test2 150 1000
Thanks for attaching a program. This helps us to understand the issue.
I wa
Hello Kuroda-san,
23.11.2023 15:15, Hayato Kuroda (Fujitsu) wrote:
I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.
Thanks, here are comments. I'm quite not sure for the windows, so I may say
something wron
Dear Alexander,
>
> I can easily reproduce this failure on my workstation by running 5 tests
> 003_logical_slots in parallel inside Windows VM with it's CPU resources
> limited to 50%, like so:
> VBoxManage controlvm "Windows" cpuexecutioncap 50
>
> set PGCTLTIMEOUT=180
> python3 -c "NUMITERATIO
Hello Kuroda-san,
21.11.2023 13:37, Hayato Kuroda (Fujitsu) wrote:
This email tells an update. The machine drongo failed the test a week ago [1]
and finally got logfiles. PSA files.
Oh, sorry. I missed to attach files. You can see pg_upgrade_server.log for now.
I can easily reproduce this fa
Dear hackers,
This email tells an update. The machine drongo failed the test a week ago [1]
and finally got logfiles. PSA files.
## Observed failure
pg_upgrade_server.log is a server log during the pg_upgrade command. According
to
it, the TRUNCATE command seemed to be failed due to a "File exis
Dear hackers,
> While tracking a buildfarm, I found that drongo failed the test
> pg_upgrade/003_logical_slots [1].
> A strange point is that the test passed in the next iteration. Currently I'm
> not
> sure the reason, but I will keep my eye for it and will investigate if it
> happens again.
Dear hackers,
While tracking a buildfarm, I found that drongo failed the test
pg_upgrade/003_logical_slots [1].
A strange point is that the test passed in the next iteration. Currently I'm not
sure the reason, but I will keep my eye for it and will investigate if it
happens again.
I think this f
Dear Andres,
While tracking BF failures related with pg_ugprade, I found the same failure
has still happened [1] - [4].
According to the log, the output directory was remained even after the
successful upgrade [5].
I analyzed and attached the fix patch, and below is my analysis... how do you
th
On 2023-02-06 14:14:22 -0800, Andres Freund wrote:
> On 2023-02-07 11:03:18 +1300, Thomas Munro wrote:
> > What I see is that there were 1254 FreeBSD tasks run in that window, of
> > which 163 failed, and (more interestingly) 111 of those failures succeeded
> > on every other platform. And clickin
Hi,
On 2023-02-07 11:03:18 +1300, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote:
> > On February 6, 2023 1:51:20 PM PST, Thomas Munro
> > wrote:
> > >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> > >builds on FreeBSD. I'll go and ping that
On Tue, Feb 7, 2023 at 11:03 AM Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote:
> > On February 6, 2023 1:51:20 PM PST, Thomas Munro
> > wrote:
> > >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> > >builds on FreeBSD. I'll go and ping that t
On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote:
> On February 6, 2023 1:51:20 PM PST, Thomas Munro
> wrote:
> >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> >builds on FreeBSD. I'll go and ping that thread...
>
> Is that rate unchanged? I thought I fixed the main
Hi,
On February 6, 2023 1:51:20 PM PST, Thomas Munro wrote:
>Next up: the new "running" tests, spuriously failing around 8.8% of CI
>builds on FreeBSD. I'll go and ping that thread...
Is that rate unchanged? I thought I fixed the main issue last week?
Greetings,
Andres
--
Sent from my Andro
On Wed, Feb 1, 2023 at 2:44 PM Thomas Munro wrote:
> OK, I pushed that. Third time lucky?
I pulled down logs for a week of Windows CI, just over ~1k builds.
The failure rate was a few per day before, but there are no failures
like that after that went in. There are logs that contain the
"Direct
On Wed, Feb 1, 2023 at 10:08 AM Thomas Munro wrote:
> On Wed, Feb 1, 2023 at 10:04 AM Andres Freund wrote:
> > Maybe we should just handle it by sleeping and retrying, if on windows? Sad
> > to even propose...
>
> Yeah, that's what that code I posted would do automatically, though
> it's a bit h
On Wed, Feb 1, 2023 at 9:54 AM Thomas Munro wrote:
> ... I have one more idea ...
I also had a second idea, barely good enough to mention and probably
just paranoia. In a nearby thread I learned that process exit does
not release Windows advisory file locks synchronously, which surprised
this Un
On Wed, Feb 1, 2023 at 10:04 AM Andres Freund wrote:
> On January 31, 2023 12:54:42 PM PST, Thomas Munro
> wrote:
> >I'm not sure about anything, but if that's what's happening here, then
> >maybe the attached would help. In short, it would make the previous
> >theory true (the idea of a second
Hi,
On January 31, 2023 12:54:42 PM PST, Thomas Munro
wrote:
>On Wed, Feb 1, 2023 at 6:28 AM Justin Pryzby wrote:
>> > I pushed the rmtree() change. Let's see if that helps, or tells us
>> > something new.
>>
>> I found a few failures since then:
>>
>> https://api.cirrus-ci.com/v1/artifact/ta
On Wed, Feb 1, 2023 at 6:28 AM Justin Pryzby wrote:
> > I pushed the rmtree() change. Let's see if that helps, or tells us
> > something new.
>
> I found a few failures since then:
>
> https://api.cirrus-ci.com/v1/artifact/task/6696942420361216/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/
On Tue, Jan 31, 2023 at 02:00:05PM +1300, Thomas Munro wrote:
> On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro wrote:
> > On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote:
> > > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > > > So [1] on its own didn't fix this. My next guess is that the
On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro wrote:
> On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote:
> > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > > So [1] on its own didn't fix this. My next guess is that the attached
> > > might help.
>
> > What is our plan here? This afaict is
On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote:
> On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > So [1] on its own didn't fix this. My next guess is that the attached
> > might help.
> What is our plan here? This afaict is the most common "false positive" for
> cfbot in the last weeks
Hi,
On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> So [1] on its own didn't fix this. My next guess is that the attached
> might help.
>
> Hmm. Following Michael's clue that this might involve log files and
> pg_ctl, I noticed one thing: pg_ctl implements
> wait_for_postmaster_stop() by wa
On Tue, Nov 08, 2022 at 01:16:09AM +1300, Thomas Munro wrote:
> So [1] on its own didn't fix this. My next guess is that the attached
> might help.
I took the liberty of adding a CF entry for this
https://commitfest.postgresql.org/41/4011/
And afterwards figured I could be a little bit wasteful
So [1] on its own didn't fix this. My next guess is that the attached
might help.
Hmm. Following Michael's clue that this might involve log files and
pg_ctl, I noticed one thing: pg_ctl implements
wait_for_postmaster_stop() by waiting for kill(pid, 0) to fail, and
our kill emulation does CallNam
Hi,
On 2022-10-17 23:31:44 -0500, Justin Pryzby wrote:
> On Tue, Oct 18, 2022 at 01:06:15PM +0900, Michael Paquier wrote:
> > On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote:
> > > * Server 2019, as used on CI, still uses the traditional NT semantics
> > > (unlink is asynchronous, whe
On Tue, Oct 18, 2022 at 01:06:15PM +0900, Michael Paquier wrote:
> On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote:
> > * Server 2019, as used on CI, still uses the traditional NT semantics
> > (unlink is asynchronous, when all handles closes)
> > * the fix I proposed has the right eff
On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote:
> * Server 2019, as used on CI, still uses the traditional NT semantics
> (unlink is asynchronous, when all handles closes)
> * the fix I proposed has the right effect (I will follow up with tests
> to demonstrate)
Wow, nice investigati
On Mon, Oct 3, 2022 at 7:29 PM Michael Paquier wrote:
> On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote:
> > So I think that setting is_lnk = false is good enough here. Do
> > you see a hole in it?
>
> I cannot think on one, on top of my head. Thanks for the
> explanation.
Some thi
On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote:
> So I think that setting is_lnk = false is good enough here. Do
> you see a hole in it?
I cannot think on one, on top of my head. Thanks for the
explanation.
--
Michael
signature.asc
Description: PGP signature
On Mon, Oct 3, 2022 at 1:40 PM Michael Paquier wrote:
> On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote:
> > I think something like the attached should do the right thing for
> > STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes
> > back to being blocking (sleep+ret
On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote:
> I think something like the attached should do the right thing for
> STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes
> back to being blocking (sleep+retry until eventually we reach ENOENT
> or we time out and give u
On Mon, Oct 3, 2022 at 9:07 AM Thomas Munro wrote:
> On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby wrote:
> > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so
> > didn't warn about the file itself, but then failed one moment later in
> > rmdir.
>
> Yeah, I think this is my fa
On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby wrote:
> I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so
> didn't warn about the file itself, but then failed one moment later in
> rmdir.
Yeah, I think this is my fault. In commit f357233c the new lstat()
call might return ENOE
Hi,
On 2022-09-27 11:47:37 +0530, Bharath Rupireddy wrote:
> Just for the records - the same issue was also seen here [1], [2].
>
> [1] https://cirrus-ci.com/task/5709014662119424?logs=check_world#L82
> [2]
> https://api.cirrus-ci.com/v1/artifact/task/5709014662119424/testrun/build/testrun/pg_up
On Tue, Sep 20, 2022 at 7:01 AM Justin Pryzby wrote:
>
> On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> > Hi,
> >
> > After my last rebase of the meson tree I encountered the following test
> > failure:
> >
> > https://cirrus-ci.com/task/5532444261613568
> >
> > [20:23:04.171] --
On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> Hi,
>
> After my last rebase of the meson tree I encountered the following test
> failure:
>
> https://cirrus-ci.com/task/5532444261613568
>
> [20:23:04.171] - 8<
> -
On Mon, Sep 19, 2022 at 06:13:17PM -0700, Andres Freund wrote:
> I don't really see what'd race with what here? pg_upgrade has precise control
> over what's happening here, no?
A code path could have forgotten a fclose() for example, but this code
is rather old and close-proof as far as I know. M
Hi,
On 2022-09-20 10:08:41 +0900, Michael Paquier wrote:
> On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> > I don't know if actually related to the commit below, but there've been a
> > lot of runs of the pg_upgrade tests in the meson branch, and this is the
> > first
> > failur
On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> I don't know if actually related to the commit below, but there've been a
> lot of runs of the pg_upgrade tests in the meson branch, and this is the first
> failure of this kind. Unfortunately the error seems to be transient -
> rerun
Hi,
After my last rebase of the meson tree I encountered the following test
failure:
https://cirrus-ci.com/task/5532444261613568
[20:23:04.171] - 8<
-
[20:23:04.171] stderr:
[20:23:04.171] # Failed test 'pg_upgrade_output
65 matches
Mail list logo