Re: Test of a partition with an incomplete detach has a timing issue

2021-05-27 Thread Noah Misch
On Tue, May 25, 2021 at 11:32:38AM -0400, Alvaro Herrera wrote: > On 2021-May-24, Noah Misch wrote: > > What if we had a standard that the step after the cancel shall send a query > > to > > the backend that just received the cancel? Something like: > > Hmm ... I don't understand why this fixes

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-25 Thread Alvaro Herrera
On 2021-May-25, Tom Lane wrote: > Alvaro Herrera writes: > > The problem disappears completely if I add a sleep to the cancel query: > > step "s1cancel" { SELECT pg_cancel_backend(pid), pg_sleep(0.01) FROM > > d3_pid; } > > I suppose a 0.01 second sleep is not going to be sufficient to close

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-25 Thread Tom Lane
Alvaro Herrera writes: > The problem disappears completely if I add a sleep to the cancel query: > step "s1cancel" { SELECT pg_cancel_backend(pid), pg_sleep(0.01) FROM > d3_pid; } > I suppose a 0.01 second sleep is not going to be sufficient to close the > problem in slower animals, but I h

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-25 Thread Alvaro Herrera
So I had a hard time reproducing the problem, until I realized that I needed to limit the server to use only one CPU, and in addition run some other stuff concurrently in the same server in order to keep it busy. With that, I see about one failure every 10 runs. So I start the server as "numactl -

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-24 Thread Noah Misch
On Mon, May 24, 2021 at 09:12:40PM -0400, Tom Lane wrote: > The experiments I did awhile ago are coming back to me now. I tried > a number of variations on this same theme, and none of them closed > the gap entirely. The fundamental problem is that it's possible > for backend A to complete its tr

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-24 Thread Tom Lane
Michael Paquier writes: > On Mon, May 24, 2021 at 02:07:12PM -0400, Alvaro Herrera wrote: >> Maybe we can change the "cancel" query to something like >> SELECT pg_cancel_backend(pid), somehow_wait_for_detach_to_terminate() FROM >> d3_pid; >> ... where maybe that function can check the "state" col

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-24 Thread Michael Paquier
On Mon, May 24, 2021 at 02:07:12PM -0400, Alvaro Herrera wrote: > I suppose a fix would imply that the error report waits until after the > "cancel" step is over, but I'm not sure how to do that. > > Maybe we can change the "cancel" query to something like > > SELECT pg_cancel_backend(pid), someh

RE: Test of a partition with an incomplete detach has a timing issue

2021-05-24 Thread osumi.takami...@fujitsu.com
On Tuesday, May 25, 2021 3:07 AM Alvaro Herrera wrote: > On 2021-May-24, osumi.takami...@fujitsu.com wrote: > > > Also, I've gotten some logs left. > > * src/test/isolation/output_iso/regression.out > > > > test detach-partition-concurrently-1 ... ok 682 ms > > test detach-partition-conc

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-24 Thread Tom Lane
Alvaro Herrera writes: > On 2021-May-24, osumi.takami...@fujitsu.com wrote: >> t >> -step s2detach: <... completed> >> -error in steps s1cancel s2detach: ERROR: canceling statement due to user >> request >> step s1c: COMMIT; >> +step s2detach: <... completed> >> +error in steps s1c

Re: Test of a partition with an incomplete detach has a timing issue

2021-05-24 Thread Alvaro Herrera
On 2021-May-24, osumi.takami...@fujitsu.com wrote: > Also, I've gotten some logs left. > * src/test/isolation/output_iso/regression.out > > test detach-partition-concurrently-1 ... ok 682 ms > test detach-partition-concurrently-2 ... ok 321 ms > test detach-partition-concurrentl