On Mon, Apr 24, 2023 at 10:53:35AM +0200, Christoph Berg wrote:
> Re: Andres Freund
> > Add smgrzeroextend(), FileZero(), FileFallocate()
> 
> Hi,
> 
> I'm often seeing PG16 builds erroring out in the pgbench tests:
> 
> 00:33:12 make[2]: Entering directory '/<<PKGBUILDDIR>>/build/src/bin/pgbench'
> 00:33:12 echo "# +++ tap check in src/bin/pgbench +++" && rm -rf 
> '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check && /bin/mkdir -p 
> '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check && cd 
> /<<PKGBUILDDIR>>/build/../src/bin/pgbench && 
> TESTLOGDIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/log' 
> TESTDATADIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check' 
> PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/postgresql/16/bin:/<<PKGBUILDDIR>>/build/src/bin/pgbench:$PATH"
>  
> LD_LIBRARY_PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/aarch64-linux-gnu"
>   PGPORT='65432' 
> top_builddir='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../..' 
> PG_REGRESS='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../../src/test/regress/pg_regress'
>  /usr/bin/prove -I /<<PKGBUILDDIR>>/build/../src/test/perl/ -I 
> /<<PKGBUILDDIR>>/build/../src/bin/pgbench --verbose t/*.pl
> 00:33:12 # +++ tap check in src/bin/pgbench +++
> 00:33:14 #   Failed test 'concurrent OID generation status (got 2 vs expected 
> 0)'
> 00:33:14 #   at t/001_pgbench_with_server.pl line 31.
> 00:33:14 #   Failed test 'concurrent OID generation stdout /(?^:processed: 
> 125/125)/'
> 00:33:14 #   at t/001_pgbench_with_server.pl line 31.
> 00:33:14 #                   'pgbench (16devel (Debian 
> 16~~devel-1.pgdg100+~20230423.1656.g8bbd0cc))
> 00:33:14 # transaction type: 
> /<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_concurrent_insert
> 00:33:14 # scaling factor: 1
> 00:33:14 # query mode: prepared
> 00:33:14 # number of clients: 5
> 00:33:14 # number of threads: 1
> 00:33:14 # maximum number of tries: 1
> 00:33:14 # number of transactions per client: 25
> 00:33:14 # number of transactions actually processed: 118/125
> 00:33:14 # number of failed transactions: 0 (0.000%)
> 00:33:14 # latency average = 26.470 ms
> 00:33:14 # initial connection time = 66.583 ms
> 00:33:14 # tps = 188.889760 (without initial connection time)
> 00:33:14 # '
> 00:33:14 #     doesn't match '(?^:processed: 125/125)'
> 00:33:14 #   Failed test 'concurrent OID generation stderr /(?^:^$)/'
> 00:33:14 #   at t/001_pgbench_with_server.pl line 31.
> 00:33:14 #                   'pgbench: error: client 2 script 0 aborted in 
> command 0 query 0: ERROR:  could not extend file "base/5/3501" with 
> FileFallocate(): Interrupted system call
> 00:33:14 # HINT:  Check free disk space.
> 00:33:14 # pgbench: error: Run was aborted; the above results are incomplete.
> 00:33:14 # '
> 00:33:14 #     doesn't match '(?^:^$)'
> 00:33:26 # Looks like you failed 3 tests of 428.
> 00:33:26 t/001_pgbench_with_server.pl ..
> 00:33:26 not ok 1 - concurrent OID generation status (got 2 vs expected 0)
> 
> I don't think the disk is full since it's always hitting that same
> spot, on some of the builds:
> 
> https://pgdgbuild.dus.dg-i.net/job/postgresql-16-binaries-snapshot/833/
> 
> This is overlayfs with tmpfs (upper)/ext4 (lower). Manually running
> that test works though, and the FS seems to support posix_fallocate:
> 
> #include <fcntl.h>
> #include <stdio.h>
> 
> int main ()
> {
>         int f;
>         int err;
> 
>         if (!(f = open("moo", O_CREAT | O_RDWR, 0666)))
>                 perror("open");
> 
>         err = posix_fallocate(f, 0, 10);
>         perror("posix_fallocate");
> 
>         return 0;
> }
> 
> $ ./a.out
> posix_fallocate: Success
> 
> The problem has been there for some weeks - I didn't report it earlier
> as I was on vacation, in the free time trying to bootstrap s390x
> support for apt.pg.o, and there was this other direct IO problem
> making all the builds fail for some time.

I noticed that dsm_impl_posix_resize() does a do while rc==EINTR and
FileFallocate() doesn't. From what the comment says in
dsm_impl_posix_resize() and some cursory googling, posix_fallocate()
doesn't restart automatically on most systems, so a do while() rc==EINTR
is often used. Is there a reason it isn't used in FileFallocate() I
wonder?

- Melanie


Reply via email to