Re: Adding CI to our tree

2022-09-06 Thread Thomas Munro
On Wed, Oct 6, 2021 at 5:01 PM Thomas Munro  wrote:
> BTW, on those two OSes there are some messages like this each time a
> submake dumps its output to the log:
>
> [03:36:16.591] fcntl(): Bad file descriptor
>
> It seems worth putting up with these compared to the alternatives of
> either not using -j, not using -Otarget and having the output of
> parallel tests all mashed up and unreadable (that still happen
> sometimes but it's unlikely, because the submakes write() whole output
> chunks at infrequent intervals), or redirecting to a file so you can't
> see the realtime test output on the main CI page (not so fun, you have
> to wait until it's finished and view it as an 'artifact').  I tried to
> write a patch for GNU make to fix that[1], let's see if something
> happens.
>
> [1] https://savannah.gnu.org/bugs/?52922

For the record, GNU make finally fixed this problem (though Andres
found a workaround anyway), but in any case it won't be in the
relevant package repos before we switch over to Meson/Ninja :-)




Re: Adding CI to our tree

2022-03-30 Thread Andres Freund
Hi,

Now that zstd is used, enable it in CI. I plan to commit this shortly, unless
somebody sees a reason not to do so.

Greetings,

Andres Freund
>From 51cc830e2e82516966fe1c84cd134b1858a89bab Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 26 Mar 2022 12:36:04 -0700
Subject: [PATCH v1] ci: enable zstd where available.

Now that zstd is used, enable it in CI.

Discussion: https://postgr.es/m/20211001222752.wrz7erzh4cajv...@alap3.anarazel.de
---
 .cirrus.yml | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 171bd29cf03..7b883b462a8 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -101,6 +101,7 @@ task:
 --with-ssl=openssl \
 --with-tcl --with-tclconfig=/usr/local/lib/tcl8.6/ \
 --with-uuid=bsd \
+--with-zstd \
 \
 --with-includes=/usr/local/include \
 --with-libs=/usr/local/lib \
@@ -142,6 +143,7 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >-
   --with-systemd
   --with-tcl --with-tclconfig=/usr/lib/tcl8.6/
   --with-uuid=ossp
+  --with-zstd
 
 
 task:
@@ -270,7 +272,8 @@ task:
   openldap \
   openssl \
   python \
-  tcl-tk
+  tcl-tk \
+  zstd
 
 brew cleanup -s # to reduce cache size
   upload_caches: homebrew
@@ -282,7 +285,7 @@ task:
 INCLUDES="${brewpath}/include:${INCLUDES}"
 LIBS="${brewpath}/lib:${LIBS}"
 
-for pkg in icu4c krb5 openldap openssl ; do
+for pkg in icu4c krb5 openldap openssl zstd ; do
   pkgpath="${brewpath}/opt/${pkg}"
   INCLUDES="${pkgpath}/include:${INCLUDES}"
   LIBS="${pkgpath}/lib:${LIBS}"
@@ -307,6 +310,7 @@ task:
   --with-ssl=openssl \
   --with-tcl --with-tclconfig=${brewpath}/opt/tcl-tk/lib/ \
   --with-uuid=e2fs \
+  --with-zstd \
   \
   --prefix=${HOME}/install \
   --with-includes="${INCLUDES}" \
-- 
2.35.1.677.gabf474a5dd



Re: Adding CI to our tree

2022-03-23 Thread Justin Pryzby
On Thu, Mar 24, 2022 at 09:52:39AM +1300, Thomas Munro wrote:
> On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> > -Og
> 
> Adding this to CXXFLAGS caused a torrent of warnings from g++ about
> LLVM headers, which I also see on my local system for LLVM 11 and LLVM
> 14:

Yes, I mentioned seeing some other warnings here.
20220302205058.gj15...@telsasoft.com

I think warnings were cleaned up with -O0/1/2 but not with -Og.




Re: Adding CI to our tree

2022-03-23 Thread Thomas Munro
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> -Og

Adding this to CXXFLAGS caused a torrent of warnings from g++ about
LLVM headers, which I also see on my local system for LLVM 11 and LLVM
14:

[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h: In member
function ‘llvm::CallInst*
llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*,
llvm::ArrayRef, const llvm::Twine&, llvm::MDNode*)’:
[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h:229:16:
warning: ‘.llvm::Twine::LHS.llvm::Twine::Child::twine’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
[19:47:11.047] 229 | !LHS.twine->isBinary())
[19:47:11.047] | ^

https://cirrus-ci.com/task/5597526098182144?logs=build#L6

Not sure who to complain to about that...




Re: Adding CI to our tree

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-22 23:14:23 -0500, Justin Pryzby wrote:
> On Fri, Mar 18, 2022 at 03:45:03PM -0700, Andres Freund wrote:
> > Pushed 0001, 0002. Only change I made was to add
> 
> Thanks - is there any reason not to do the MSVC compiler warnings one, too ?

Purely a lack of round tuits. IIRC I thought there was a small aspect that
still needed some polishing, but didn't have time to tackle it yet.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-22 Thread Justin Pryzby
On Fri, Mar 18, 2022 at 03:45:03PM -0700, Andres Freund wrote:
> Pushed 0001, 0002. Only change I made was to add

Thanks - is there any reason not to do the MSVC compiler warnings one, too ?

I see that it'll warn about issues with at least 3 patches (including one of
yours).

-- 
Justin




Re: Adding CI to our tree

2022-03-18 Thread Andres Freund
Hi,

Pushed 0001, 0002. Only change I made was to add
DEBIAN_FRONTEND=noninteractive to the apt-get invocations, because some
packages will fail / complain verbosely if there's no interactive prompt
during installation.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-10 Thread Justin Pryzby
See attached, or at
https://github.com/justinpryzby/postgres/runs/5503079878
>From c631c3d9bdb8325aaaecc5dcdfac46eca7bd907a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 40854046d66..6026a973dbb 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,12 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_core_files_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+  setup_additional_packages_script: |
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +182,13 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_core_files_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+  setup_additional_packages_script: |
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +242,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_core_files_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -254,7 +259,7 @@ task:
   # packages do not need to be downloaded.
   homebrew_cache:
 folder: $HOMEBREW_CACHE
-  homebrew_install_script: |
+  setup_additional_packages_script: |
 brew install \
   ccache \
   icu4c \
@@ -389,6 +394,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_additional_packages_script: |
+REM choco install -y --no-progress ...
+
   configure_script:
 # copy errors out when using forward slashes
 - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
@@ -484,6 +492,10 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_additional_packages_script: |
+#apt-get update
+#apt-get -y install ...
+
   ###
   # Test that code can be built with gcc/clang without warnings
   ###
-- 
2.17.1

>From 9e5df76468c4ecc8f411f106c86d54c1f06f70f3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 27 Feb 2022 15:17:50 -0600
Subject: [PATCH 2/7] cirrus: compile with -Og..

To improve performance of check-world, and improve debugging, without
significantly slower builds (they're cached anyway).

This makes freebsd check-world run in 8.5 minutes rather than 15 minutes.
And on linux mitigates the run-time performance effect of --coverage.
---
 .cirrus.yml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 6026a973dbb..5179353dc9d 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -107,7 +107,7 @@ task:
 \
 CC="ccache cc" \
 CXX="ccache c++" \
-CFLAGS="-O0 -ggdb"
+CFLAGS="-Og -ggdb"
 EOF
   build_script: su postgres -c "gmake -s -j${BUILD_JOBS} world-bin"
   upload_caches: ccache
@@ -201,8 +201,8 @@ task:
 CC="ccache gcc" \
 CXX="ccache g++" \
 CLANG="ccache clang" \
-CFLAGS="-O0 -ggdb" \
-CXXFLAGS="-O0 -ggdb"
+CFLAGS="-Og -ggdb" \
+CXXFLAGS="-Og -ggdb"
 EOF
   build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin"
   upload_caches: ccache
@@ -315,8 +315,8 @@ task:
   CC="ccache cc" \
   CXX="ccache c++" \
   CLANG="ccache ${brewpath}/llvm/bin/ccache" \
-  CFLAGS="-O0 -ggdb" \
-  CXXFLAGS="-O0 -ggdb" \
+  CFLAGS="-Og -ggdb" \
+  CXXFLAGS="-Og -ggdb" \
   \
   LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
   PYTHON=python3
-- 
2.17.1

>From 92f64c297628ebb7026b6b61795444ea078720d2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 20 Feb 2022 15:01:59 -0600
Subject: [PATCH 3/7] cirrus/windows: add compiler_warnings_script

I'm not sure how to write this test in windows shell; it's also not easy to
write it in posix sh, since windows shell is somehow interpretting && and ||...

ci-os-only: windows

https://cirrus-ci.com/task/6183879907213312
https://cirrus-ci.com/task/4876271443247104
---
 .cirrus.yml|  9 -
 src/tools/ci/windows-compiler-warnings | 16 
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100755 

Re: Adding CI to our tree

2022-03-10 Thread Justin Pryzby
On Thu, Mar 10, 2022 at 12:50:15PM -0800, Andres Freund wrote:
> > -  setup_cores_script: |
> > +  setup_os_script: |
> >  mkdir -m 770 /tmp/cores
> >  chown root:postgres /tmp/cores
> >  sysctl kern.corefile='/tmp/cores/%N.%P.core'
> > +#pkg install -y ...
> 
> Would you mind if I split this into setup_core_files_script and
> setup_additional_packages_script:?

That's fine.  FYI I'm also planning on using choco install --no-progress
I could resend my latest patches shortly.

> > Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script
> > 
> > I'm not sure how to write this test in windows shell; it's also not easy to
> > write it in posix sh, since windows shell is somehow interpretting && and 
> > ||...
> 
> That comment isn't accurate anymore now that it's in an external script,
> right?

No, it is accurate.  What I mean is that it's also hard to write it as a
1-liner using posix sh, since the || (and &&) seemed to be interpretted by
cmd.exe and needed escaping - gross.

-- 
Justin




Re: Adding CI to our tree

2022-03-10 Thread Andres Freund
Hi,

On 2022-03-02 14:50:58 -0600, Justin Pryzby wrote:
> From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Mon, 17 Jan 2022 00:53:04 -0600
> Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..
> 
> This is useful for patches during development, but once a feature is merged,
> new libraries should be added to the OS image files, rather than installed
> during every CI run forever into the future.
> ---
>  .cirrus.yml | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index d10b0a82f9f..1b7c36283e9 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -73,10 +73,11 @@ task:
>  chown -R postgres:postgres .
>  mkdir -p ${CCACHE_DIR}
>  chown -R postgres:postgres ${CCACHE_DIR}
> -  setup_cores_script: |
> +  setup_os_script: |
>  mkdir -m 770 /tmp/cores
>  chown root:postgres /tmp/cores
>  sysctl kern.corefile='/tmp/cores/%N.%P.core'
> +#pkg install -y ...

Would you mind if I split this into setup_core_files_script and
setup_additional_packages_script:?


> +  # The commit that this branch is rebased on.  There's no easy way to find 
> this.
> +  # This does the right thing for cfbot, which always squishes all patches 
> into a single commit.
> +  # And does the right thing for any 1-patch commits.
> +  # Patches series manually submitted to cirrus may benefit from setting this
> +  # to the number of patches in the series (or directly to the commit the 
> series was rebased on).
> +  BASE_COMMIT: HEAD~1

Still think that something using
  git merge-base $CIRRUS_LAST_GREEN_CHANGE HEAD

might be better. With a bit of error handling for unset
CIRRUS_LAST_GREEN_CHANGE and for git not seeing enough history for
CIRRUS_LAST_GREEN_CHANGE.


> +apt-get update
> +apt-get -y install lcov

FWIW, I just added that to the install script used for the container / VM
image build. So it'll be pre-installed once that completes.

https://cirrus-ci.com/build/5818788821073920


> From feceea4413b84f478e6a0888cdfab4be1c80767a Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 20 Feb 2022 15:01:59 -0600
> Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script
> 
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and 
> ||...

That comment isn't accurate anymore now that it's in an external script,
right?


Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-09 Thread Thomas Munro
On Thu, Mar 10, 2022 at 4:33 PM Andres Freund  wrote:
> On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> > I'm confused.
>
> The "terrible IO wait" thing was before we reduced the number of CPUs and
> concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
> in which case -Og obviously can make a difference.

Oh, duh, yeah, that makes sense when you put it that way and
considering the CPU graph:

-O0: https://cirrus-ci.com/task/4578631912521728
-Og: https://cirrus-ci.com/task/5239486182326272




Re: Adding CI to our tree

2022-03-09 Thread Andres Freund
Hi,

On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
> 12:43 when I tried (terrible absolute times, but fantastic
> improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
> with this change.  My working theory had been that there is something
> bad happening in the I/O stack under concurrency making FreeBSD on
> Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
> something I hope to dig into when post-freeze round tuits present
> themselves), but that doesn't gel with this huge improvement from
> noodling with optimiser details, and I don't know why I don't see
> something similar locally.  I'm confused.

The "terrible IO wait" thing was before we reduced the number of CPUs and
concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
in which case -Og obviously can make a difference.


> Just BTW it's kinda funny that we say -ggdb for macOS and then we use
> lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
> correct to say -glldb, but doubt it matters much...

Yea. I used -ggdb because I didn't know -glldb existed :). And there's also
the advantage that -ggdb works both with gcc and clang, whereas -glldb doesn't
seem to be known to gcc.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-09 Thread Thomas Munro
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> > On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > > I'm curious what you think of this patch.
> > >
> > > It makes check-world on freebsd over 30% faster - saving 5min.
> >
> > That's nice! While -Og makes interactive debugging noticeably harder IME, 
> > it's
> > not likely to be a large enough difference just for backtraces etc.
>
> Yeah.  gcc(1) claims that -Og can improve debugging.

Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
12:43 when I tried (terrible absolute times, but fantastic
improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
with this change.  My working theory had been that there is something
bad happening in the I/O stack under concurrency making FreeBSD on
Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
something I hope to dig into when post-freeze round tuits present
themselves), but that doesn't gel with this huge improvement from
noodling with optimiser details, and I don't know why I don't see
something similar locally.  I'm confused.

Just BTW it's kinda funny that we say -ggdb for macOS and then we use
lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
correct to say -glldb, but doubt it matters much...




Re: Adding CI to our tree

2022-03-09 Thread Justin Pryzby
On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > I'm curious what you think of this patch.
> > 
> > It makes check-world on freebsd over 30% faster - saving 5min.
> 
> That's nice! While -Og makes interactive debugging noticeably harder IME, it's
> not likely to be a large enough difference just for backtraces etc.

Yeah.  gcc(1) claims that -Og can improve debugging.

I should've mentioned that this seems to mitigate the performance effect of
--coverage on linux, too.

> I'm far less convinced that using "MaxSpeed" for the msvc build is a good
> idea. I've looked at one or two backtraces of optimized msvc builds and
> backtraces were quite a bit worse - and they're not great to start with.  What
> was the win there?

Did you compare FULL optimization or "favor speed/size" or "default"
optimization ?

It's worth trading some some build time (especially with a compiler cache) for
test time (especially with alltaptests).  But I didn't check backtraces, and I
didn't compare the various optimization options.  The argument may not be as
strong for windows, since it has no build cache (and it has no -Og).  We'd save
a bit more when also running the other tap tests.

CI runs are probably not very consistent, but I've just run
https://cirrus-ci.com/task/5236145167532032
master is the average of 4 patches at the top of cfbot.

/ master / patched / change
subscription/ 197s   / 195s/ +2s
recovery/ 234s   / 212s/ -22s
bin / 383s   / 373s/ -10s

-- 
Justin




Re: Adding CI to our tree

2022-03-09 Thread Andres Freund
Hi,

On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> I'm curious what you think of this patch.
> 
> It makes check-world on freebsd over 30% faster - saving 5min.

That's nice! While -Og makes interactive debugging noticeably harder IME, it's
not likely to be a large enough difference just for backtraces etc.

I'm far less convinced that using "MaxSpeed" for the msvc build is a good
idea. I've looked at one or two backtraces of optimized msvc builds and
backtraces were quite a bit worse - and they're not great to start with.  What
was the win there?

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-09 Thread Justin Pryzby
I'm curious what you think of this patch.

It makes check-world on freebsd over 30% faster - saving 5min.

Apparently gcc -Og was added in gcc 4.8 (c. 2013).

On Wed, Mar 02, 2022 at 02:50:58PM -0600, Justin Pryzby wrote:
> From d180953d273c221a30c5e9ad8d74b1b4dfc60bd1 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 27 Feb 2022 15:17:50 -0600
> Subject: [PATCH 7/7] cirrus: compile with -Og..
> 
> To improve performance of check-world, and improve debugging, without
> significantly slower builds (they're cached anyway).
> 
> This makes freebsd check-world run in 8.5 minutes rather than 15 minutes.
> ---
>  .cirrus.yml  | 12 +++-
>  src/tools/msvc/MSBuildProject.pm |  4 ++--
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 6f05d420c85..8b673bf58cf 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -113,7 +113,7 @@ task:
>  \
>  CC="ccache cc" \
>  CXX="ccache c++" \
> -CFLAGS="-O0 -ggdb"
> +CFLAGS="-Og -ggdb"
>  EOF
>build_script: su postgres -c "gmake -s -j${BUILD_JOBS} world-bin"
>upload_caches: ccache
> @@ -208,8 +208,8 @@ task:
>  CC="ccache gcc" \
>  CXX="ccache g++" \
>  CLANG="ccache clang" \
> -CFLAGS="-O0 -ggdb" \
> -CXXFLAGS="-O0 -ggdb"
> +CFLAGS="-Og -ggdb" \
> +CXXFLAGS="-Og -ggdb"
>  EOF
>build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin"
>upload_caches: ccache
> @@ -329,8 +329,8 @@ task:
>CC="ccache cc" \
>CXX="ccache c++" \
>CLANG="ccache ${brewpath}/llvm/bin/ccache" \
> -  CFLAGS="-O0 -ggdb" \
> -  CXXFLAGS="-O0 -ggdb" \
> +  CFLAGS="-Og -ggdb" \
> +  CXXFLAGS="-Og -ggdb" \
>\
>LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
>PYTHON=python3
> @@ -383,6 +383,8 @@ task:
>  # -fileLoggerParameters1: write to msbuild.warn.log.
>  MSBFLAGS: -m -verbosity:minimal 
> "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false 
> -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log
>  
> +MSBUILD_OPTIMIZE: MaxSpeed
> +
>  # If tests hang forever, cirrus eventually times out. In that case log
>  # output etc is not uploaded, making the problem hard to debug. Of course
>  # tests internally should have shorter timeouts, but that's proven to not
> diff --git a/src/tools/msvc/MSBuildProject.pm 
> b/src/tools/msvc/MSBuildProject.pm
> index 5e312d232e9..05e0c41eb5c 100644
> --- a/src/tools/msvc/MSBuildProject.pm
> +++ b/src/tools/msvc/MSBuildProject.pm
> @@ -85,7 +85,7 @@ EOF
>   $f, 'Debug',
>   {
>   defs=> "_DEBUG;DEBUG=1",
> - opt => 'Disabled',
> + opt => $ENV{MSBUILD_OPTIMIZE} || 'Disabled',
>   strpool => 'false',
>   runtime => 'MultiThreadedDebugDLL'
>   });
> @@ -94,7 +94,7 @@ EOF
>   'Release',
>   {
>   defs=> "",
> - opt => 'Full',
> + opt => $ENV{MSBUILD_OPTIMIZE} || 'Full',
>   strpool => 'true',
>   runtime => 'MultiThreadedDLL'
>   });
> -- 
> 2.17.1
> 




Re: Adding CI to our tree (ccache)

2022-03-07 Thread Justin Pryzby
On Mon, Mar 07, 2022 at 11:10:54AM -0800, Andres Freund wrote:
> On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote:
> > On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> > > I tried to use it, but saw that no caching was happening, and debugged
> > > it. Which yielded that it can't be used due to the way output files are
> > > specified (and due to multiple files, but that can be prevented with an
> > > msbuild parameter).
> > 
> > Could you give a hint about to the msbuild param to avoid processing 
> > multiple
> > files with cl.exe?  I'm not able to find it...
> 
> /p:UseMultiToolTask=true

Wow - thanks ;)

> > I don't know about the issue with output filenames ..
> 
> I don't remember the precise details anymore, but it boils down to ccache
> requiring the output filename to be specified, but we only specify the output
> directory. Or very similar.

There's already a problem report and PR for this.
I didn't test it, but I hope it'll be fixed in their next minor release.

https://github.com/ccache/ccache/issues/1018

-- 
Justin




Re: Adding CI to our tree (ccache)

2022-03-07 Thread Andres Freund
Hi,

On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote:
> On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> > I tried to use it, but saw that no caching was happening, and debugged
> > it. Which yielded that it can't be used due to the way output files are
> > specified (and due to multiple files, but that can be prevented with an
> > msbuild parameter).
> 
> Could you give a hint about to the msbuild param to avoid processing multiple
> files with cl.exe?  I'm not able to find it...

/p:UseMultiToolTask=true


> I don't know about the issue with output filenames ..

I don't remember the precise details anymore, but it boils down to ccache
requiring the output filename to be specified, but we only specify the output
directory. Or very similar.

Greetings,

Andres Freund




Re: Adding CI to our tree (ccache)

2022-03-06 Thread Justin Pryzby
On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> I tried to use it, but saw that no caching was happening, and debugged
> it. Which yielded that it can't be used due to the way output files are
> specified (and due to multiple files, but that can be prevented with an
> msbuild parameter).

Could you give a hint about to the msbuild param to avoid processing multiple
files with cl.exe?  I'm not able to find it...

I don't know about the issue with output filenames ..

-- 
Justin




Re: Adding CI to our tree (ccache)

2022-03-04 Thread Andres Freund
Hi,

On 2022-03-03 22:56:15 -0600, Justin Pryzby wrote:
> On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> > On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> > > Have you tried to use the yet-to-be-released ccache with MSVC ?
> > 
> > Yes, it doesn't work, because it requires cl.exe to be used in a specific 
> > way
> > (only a single input file, specific output file naming). Which would 
> > require a
> > decent amount of changes to src/tools/msvc. I think it's more realistic with
> > meson etc.
> 
> Did you get to the point that that causes a problem, or did you just realize
> that it was a limitation that seems to preclude its use ?

I tried to use it, but saw that no caching was happening, and debugged
it. Which yielded that it can't be used due to the way output files are
specified (and due to multiple files, but that can be prevented with an
msbuild parameter).


> If so, could you send the branch/commit you had ?

This was in a local VM, not cirrus. I ended up making it work with the meson
build, after a bit of fiddling. Although bypassing msbuild (by building with
ninja, using cl.exe) is a larger win...


> The error I'm getting when I try to use ccache involves .rst files, which 
> don't
> exist (and which ccache doesn't know how to find or ignore).
> https://cirrus-ci.com/task/5441491957972992

ccache has code to deal with response files. I suspect the problem here is
rather that ccache expects the compiler as an argument.


> I gather this is the difference between "compiling with MSVC" and compiling
> with a visual studio project.

I doubt it's related to that.

Greetings,

Andres Freund




Re: Adding CI to our tree (ccache)

2022-03-03 Thread Justin Pryzby
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> > Have you tried to use the yet-to-be-released ccache with MSVC ?
> 
> Yes, it doesn't work, because it requires cl.exe to be used in a specific way
> (only a single input file, specific output file naming). Which would require a
> decent amount of changes to src/tools/msvc. I think it's more realistic with
> meson etc.

Did you get to the point that that causes a problem, or did you just realize
that it was a limitation that seems to preclude its use ?  If so, could you
send the branch/commit you had ?

The error I'm getting when I try to use ccache involves .rst files, which don't
exist (and which ccache doesn't know how to find or ignore).
https://cirrus-ci.com/task/5441491957972992

I gather this is the difference between "compiling with MSVC" and compiling
with a visual studio project.




Re: Adding CI to our tree

2022-03-02 Thread Justin Pryzby
On Mon, Feb 28, 2022 at 02:58:02PM -0600, Justin Pryzby wrote:
> I still think that if "Build Docs" is a separate cirrus task, it should 
> rebuild
> docs on every CI run, even if they haven't changed, for any patch that touches
> docs/.  It'll be confusing if cfbot shows 5 green circles and 4 of them were
> built 1 day ago, and 1 was built 3 weeks ago.  Docs are the task that runs
> quickest, so I don't think it's worth doing anything special there (especially
> without understanding the behavior of changesInclude()).
> 
> Also, to allow users to view the built HTML docs, cfbot would need to 1) keep
> track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow
> showing artifacts from the previous run.  If it's not already done, I think 
> the
> first half is a good idea on its own.  But the 2nd part doesn't seem 
> desirable.

Maybe changesInclude() could work if we use this URL (from cirrus'
documentation), which uses the artifacts from the last successful build.
https://api.cirrus-ci.com/v1/artifact/github/justinpryzby/postgres/Documentation/html_docs/html_docs/00-doc.html?branch=citest-cirrus2

That requires knowing the file being modified, so we'd have to generate an
index of changed files - which I've started doing here.

> However, I realized that we can filter on cfbot with either of these:
> | $CIRRUS_CHANGE_TITLE =~ '^\[CF...'
> | git log -1 |grep '^Author: Commitfest Bot '
> If we can assume that cfbot will continue submitting branches as a single
> patch, this resolves the question of a "base branch", for cfbot.

I don't know what you think of that idea, but I think I want to amend my
proposal: show HTML and coverage artifacts for HEAD~1, unless set otherwise by
an environment var.  Today, that'd do the right thing for cfbot, and also for
any 1-patch commits.

> These patches implement that idea, and make "code coverage" and "HTML diffs"
> stuff only run for cfbot commits.  This still needs another round of testing,
> though.

The patch was missing a file due to an issue while rebasing - oops.

BTW (regarding the last patch), I just noticed that -Og optimization can cause
warnings with gcc-4.8.5-39.el7.x86_64.

be-fsstubs.c: In function 'be_lo_export':
be-fsstubs.c:522:24: warning: 'fd' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  if (CloseTransientFile(fd) != 0)
^
trigger.c: In function 'ExecCallTriggerFunc':
trigger.c:2400:2: warning: 'result' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  return (HeapTuple) DatumGetPointer(result);
  ^
xml.c: In function 'xml_pstrdup_and_free':
xml.c:1205:2: warning: 'result' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  return result;

-- 
Justin
>From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9f..1b7c36283e9 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +240,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_os_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -384,6 +387,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_os_script: |
+REM choco install -y ...
+
   configure_script:
 # copy errors out when using forward slashes
 - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
@@ -479,6 +485,10 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_os_script: |
+#apt-get update
+#apt-get -y install ...
+
   ###
   # Test that code can be built with 

Re: Adding CI to our tree

2022-02-28 Thread Justin Pryzby
On Sat, Feb 26, 2022 at 08:08:38PM -0800, Andres Freund wrote:
> On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote:
> > If someone renames or removes an xref target, shouldn't CI fail on its next
> > run for a patch which tries to reference it ?
> 
> Why wouldn't it?

I suppose you're right - I was thinking that cirrus was checking whether the
*patch* had changed any matching files, but it probably checks (as it should)
whether "the sources" have changed.

Hmm, it's behaving strangely...if there's a single argument ('docs/**'), then
it will skip the docs task if I resubmit it after git commit --amend --no-edit.
But with multiple args ('.cirrus.yaml', 'docs/**') it reruns it...
I tried it as skip: !changesInclude() and by adding it to the existing only_if:
(.. || ..) && changesInclude().

> > Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
> > relative to the previous run for branch: commitfest/NN/.
> 
> Not entirely sure, but it's what I remember observing when ammending commits
> in a repo using changesInclues. If I were to build a feature like it I'd look
> at the list of files of
>   git diff $(git merge-base last_green new_commit)..new_commit
> 
> or such.  cfbot doesn't commit incrementally but replaces the prior commit, so
> I suspect it'll always be viewn as new. But who knows, shouldn't be hard to
> figure out.

Anyway...

I still think that if "Build Docs" is a separate cirrus task, it should rebuild
docs on every CI run, even if they haven't changed, for any patch that touches
docs/.  It'll be confusing if cfbot shows 5 green circles and 4 of them were
built 1 day ago, and 1 was built 3 weeks ago.  Docs are the task that runs
quickest, so I don't think it's worth doing anything special there (especially
without understanding the behavior of changesInclude()).

Also, to allow users to view the built HTML docs, cfbot would need to 1) keep
track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow
showing artifacts from the previous run.  If it's not already done, I think the
first half is a good idea on its own.  But the 2nd part doesn't seem desirable.

However, I realized that we can filter on cfbot with either of these:
| $CIRRUS_CHANGE_TITLE =~ '^\[CF...'
| git log -1 |grep '^Author: Commitfest Bot '
If we can assume that cfbot will continue submitting branches as a single
patch, this resolves the question of a "base branch", for cfbot.

(Actually, I'd prefer if it preserved the original patches as separate commits,
but that isn't what it does).  

These patches implement that idea, and make "code coverage" and "HTML diffs"
stuff only run for cfbot commits.  This still needs another round of testing,
though.

-- 
Justin

PS. I've just done this.  I'm unsure whether to say that it's wonderful or
terrible.  This would certainly be better if each branch preserved the original
set of patches.

$ git remote add cfbot https://github.com/postgresql-cfbot/postgresql
$ git fetch cfbot
$ git branch -a |grep -c cfbot
5417
>From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/6] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9f..1b7c36283e9 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +240,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_os_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -384,6 +387,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_os_script: |
+REM choco install -y ...
+
   configure_script:
 # copy errors out when using forward slashes

Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote:
> I did git commit --amend --no-edit and repushed to github to trigger a new CI
> run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714
>
> This is in a branch with changes to doc.  I wasn't intending it to skip
> building docs on this branch just because the same, changed docs were
> previously built.

But why not? If nothing in docs/ changes, there's little point? It'd probably
be good to check .cirrus.yml and docs/**, to make sure that .cirrus logic
changes rerun as well.


> Why wouldn't the docs be built following the same logic as the rest of the
> sources?

Tests have a certain rate of spurious failure, so rerunning them makes
sense. But what do we gain by rebuilding the docs? And especially, what do we
gain about uploading the docs e.g. in the postgres/postgres repo?


> If someone renames or removes an xref target, shouldn't CI fail on its next
> run for a patch which tries to reference it ?

Why wouldn't it?


> It would fail on the buildfarm, and I think one major use for the CI is to
> minimize the post-push cleanup cycles.

I personally see it more as access to a "mini buildfarm" before patches are
committable, but that's of course compatible.


> Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
> relative to the previous run for branch: commitfest/NN/.

Not entirely sure, but it's what I remember observing when ammending commits
in a repo using changesInclues. If I were to build a feature like it I'd look
at the list of files of
  git diff $(git merge-base last_green new_commit)..new_commit

or such.  cfbot doesn't commit incrementally but replaces the prior commit, so
I suspect it'll always be viewn as new. But who knows, shouldn't be hard to
figure out.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-26 Thread Justin Pryzby
On Sat, Feb 26, 2022 at 06:50:00PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote:
> > This doesn't do the right thing - I just tried.
> > https://cirrus-ci.org/guide/writing-tasks/#environment-variables
> > | changesInclude function can be very useful for skipping some tasks when 
> > no changes to sources have been made since the last successful Cirrus CI 
> > build.
> 
> > That means it will not normally rebuild docs (and then this still requires
> > resolving the "base branch").
> 
> Why would we want to rebuild docs if they're the same as in the last build for
> the same branch?  For cfbot purposes each commit is independent from the prior
> commit, so it should rebuild it every time if the CF entry has changes to the
> docs.

I did git commit --amend --no-edit and repushed to github to trigger a new CI
run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714

This is in a branch with changes to doc.  I wasn't intending it to skip
building docs on this branch just because the same, changed docs were
previously built.

Why wouldn't the docs be built following the same logic as the rest of the
sources ?  If someone renames or removes an xref target, shouldn't CI fail on
its next run for a patch which tries to reference it ?  It would fail on the
buildfarm, and I think one major use for the CI is to minimize the post-push
cleanup cycles.

Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
relative to the previous run for branch: commitfest/NN/.

-- 
Justin




Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote:
> This doesn't do the right thing - I just tried.
> https://cirrus-ci.org/guide/writing-tasks/#environment-variables
> | changesInclude function can be very useful for skipping some tasks when no 
> changes to sources have been made since the last successful Cirrus CI build.

> That means it will not normally rebuild docs (and then this still requires
> resolving the "base branch").

Why would we want to rebuild docs if they're the same as in the last build for
the same branch?  For cfbot purposes each commit is independent from the prior
commit, so it should rebuild it every time if the CF entry has changes to the
docs.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-26 Thread Justin Pryzby
On Sat, Feb 26, 2022 at 05:09:08PM -0800, Andres Freund wrote:
> > XXX: if this is run in the same task, the configure flags should probably be
> > consistent ?
> 
> What do you mean?

I mean that commit to run CompilerWarnings unconditionally built docs with
different flags than the other stuff in that task.  If it's going to be a
separate task, then that doesn't matter.

> > +# Verify docs can be built, and upload changed docs as artifacts
> > +task:
> > +  name: HTML docs
> > +
> > +  env:
> > +CPUS: 1
> > +
> > +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
> > $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*'
> > +
> > +  container:
> > +image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
> > +cpu: $CPUS
> > +
> 
> how about using something like (the syntax might be slightly off)
>   skip: !changesInclude('doc/**')
> to avoid running it for the many pushes where no docs are changed?

This doesn't do the right thing - I just tried.
https://cirrus-ci.org/guide/writing-tasks/#environment-variables
| changesInclude function can be very useful for skipping some tasks when no 
changes to sources have been made since the last successful Cirrus CI build.

That means it will not normally rebuild docs (and then this still requires
resolving the "base branch").

-- 
Justin




Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
On 2022-02-26 17:09:08 -0800, Andres Freund wrote:
> You could put the script in src/tools/ci and call it from the script to avoid
> the quoting issues.

Might also be a good idea for the bulk of the docs / coverage stuff, even if
there are no quoting issues.




Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
Hi,

> Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts

I still think the determination of the base branch needs to be resolved before
this can be considered.


> Always run doc build; to allow them to be shown in cfbot, they should not be
> skipped if the linux build fails.
> 
> This could be done on the client side (cfbot).  One advantage of doing it here
> is that fewer docs are uploaded - many patches won't upload docs at all.

Imo this stuff is largely independent from the commit subject


> XXX: if this is run in the same task, the configure flags should probably be
> consistent ?

What do you mean?


> Subject: [PATCH 3/7] s!build docs as a separate task..

Could you reorder this to earlier, then we can merge it before resolving the
branch issues. And we don't waffle on the CompilerWarnings dependency.


> I believe this'll automatically show up as a separate "column" on the cfbot
> page.

Yup.


> +# Verify docs can be built, and upload changed docs as artifacts
> +task:
> +  name: HTML docs
> +
> +  env:
> +CPUS: 1
> +
> +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
> $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*'
> +
> +  container:
> +image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
> +cpu: $CPUS
> +

how about using something like (the syntax might be slightly off)
  skip: !changesInclude('doc/**')
to avoid running it for the many pushes where no docs are changed?


> +  sysinfo_script: |
> +id
> +uname -a
> +cat /proc/cmdline
> +ulimit -a -H && ulimit -a -S
> +export
> +
> +git remote -v
> +git branch -a
> +git remote add postgres https://github.com/postgres/postgres
> +time git fetch -v postgres master
> +git log -1 postgres/master
> +git diff --name-only postgres/master..

Hardly "sysinfo"?


> Subject: [PATCH 4/7] wip: cirrus: code coverage
> 
> XXX: lcov should be installed in the OS image

FWIW, you can open a PR in https://github.com/anarazel/pg-vm-images/
both the debian docker and VM have their packages installed
via scripts/linux_debian_install_deps.sh


> From 226699150e3e224198fc297689add21bece51c4b Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 9 Jan 2022 18:25:02 -0600
> Subject: [PATCH 5/7] cirrus/vcregress: test modules/contrib with
>  NO_INSTALLCHECK=1

I don't want to commit the vcregress.pl part myself. But if you split off I'm
happy to push the --temp-config bits.


> From 08933bcd93d4f57ad73ab6df2f1627b93e61b459 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 16 Jan 2022 12:51:13 -0600
> Subject: [PATCH 6/7] wip: cirrus/windows: save tmp_install as an artifact..
> 
> ..to allow users to easily download compiled binaries to try a patch.
> If they don't have a development environment handy or not familiar with
> compilation.
> 
> XXX: maybe this should be conditional or commented out ?

Yea, I don't want to do this by default, that's a fair bit of data that very
likely nobody will ever access. One can make entire tasks triggered manually,
but that'd then require building again :/.



> From a7d2bba6f51d816412fb645b0d4821c36ee5c400 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 20 Feb 2022 15:01:59 -0600
> Subject: [PATCH 7/7] wip: cirrus/windows: add compiler_warnings_script
> 
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and 
> ||...

You could put the script in src/tools/ci and call it from the script to avoid
the quoting issues.

Would be good to add a comment explaining the fileLoggerParameters1 thing and
a warning that compiler_warnings_script should be the last script.


Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-25 Thread Justin Pryzby
This is the other half of my CI patches, which are unrelated to the TAP ones on
the other thread.
>From 88c01c09ee26db2817629265fc12b2dbcd8c9a91 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9..1b7c36283e 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +240,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_os_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -384,6 +387,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_os_script: |
+REM choco install -y ...
+
   configure_script:
 # copy errors out when using forward slashes
 - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
@@ -479,6 +485,10 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_os_script: |
+#apt-get update
+#apt-get -y install ...
+
   ###
   # Test that code can be built with gcc/clang without warnings
   ###
-- 
2.17.1

>From 7a80dfccb17f454849679ebe9abc89bd0901828a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Jan 2022 11:27:28 -0600
Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts

Always run doc build; to allow them to be shown in cfbot, they should not be
skipped if the linux build fails.

This could be done on the client side (cfbot).  One advantage of doing it here
is that fewer docs are uploaded - many patches won't upload docs at all.

XXX: if this is run in the same task, the configure flags should probably be
consistent ?

https://cirrus-ci.com/task/5396696388599808

ci-os-only: linux
---
 .cirrus.yml | 46 ++
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 1b7c36283e..f21b249b6f 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -450,10 +450,6 @@ task:
 task:
   name: CompilerWarnings
 
-  # To limit unnecessary work only run this once the normal linux test succeeds
-  depends_on:
-- Linux - Debian Bullseye
-
   env:
 CPUS: 4
 BUILD_JOBS: 4
@@ -482,6 +478,13 @@ task:
 clang -v
 export
 
+git remote -v
+git branch -a
+git remote add postgres https://github.com/postgres/postgres
+time git fetch -v postgres master
+git log -1 postgres/master
+git diff --name-only postgres/master..
+
   ccache_cache:
 folder: $CCACHE_DIR
 
@@ -557,16 +560,35 @@ task:
   ###
   # Verify docs can be built
   ###
-  # XXX: Only do this if there have been changes in doc/ since last build
   always:
 docs_build_script: |
-  time ./configure \
---cache gcc.cache \
-CC="ccache gcc" \
-CXX="ccache g++" \
-CLANG="ccache clang"
-  make -s -j${BUILD_JOBS} clean
-  time make -s -j${BUILD_JOBS} -C doc
+  # Do nothing if the patch doesn't update doc:
+  git diff --name-only --cherry-pick --exit-code postgres/master... doc && exit
+
+  # Exercise HTML and other docs:
+  ./configure >/dev/null
+  make -s -C doc
+  cp -r doc new-docs
+
+  # Build HTML docs from the parent commit
+  git checkout postgres/master -- doc
+  make -s -C doc clean
+  make -s -C doc html
+  cp -r doc old-docs
+
+  # Copy HTML which differ into html_docs
+  # This will show any files which differ from files generated from postgres/master.
+  # Commits to postgres/master which affect doc/ will cause more files to be
+  # included until the patch is rebased.
+  mkdir html_docs
+  cp new-docs/src/sgml/html/*.css new-docs/src/sgml/html/*.svg html_docs/
+  for f in 

Re: Adding CI to our tree (ccache)

2022-02-20 Thread Justin Pryzby
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> > Did you ever try to use clcache (or others) ?
> > 
> > When I tried, it refused to cache because of our debug settings
> > (DebugInformationFormat) - which seem to be enabled even in release mode.
> 
> > I wonder if that'll be an issue for ccache, too.  I think that line may 
> > need to
> > be conditional on debug mode.
> 
> That's relatively easily solvable by using a different debug format IIRC (/Z7
> or such).

Yes. I got that working for CI by overriding with a value from the environment.
https://cirrus-ci.com/task/6191974075072512

This is right after rebasing, so it doesn't save anything, but normally cuts
build time to 90sec, which isn't impressive, but it's something.

BTW, I think it's worth compiling the windows build with optimizations (as I
did here).  At least with all the tap tests, this pays for itself.  I suppose
you don't want to use a Release build, but optimizations could be enabled by
an(other) environment variable.

-- 
Justin




Re: Adding CI to our tree (ccache)

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> Have you tried to use the yet-to-be-released ccache with MSVC ?

Yes, it doesn't work, because it requires cl.exe to be used in a specific way
(only a single input file, specific output file naming). Which would require a
decent amount of changes to src/tools/msvc. I think it's more realistic with
meson etc.


> Also, do you know about msbuild /outputResultsCache ?

I don't think it's really usable for what we need. But it's hard to tell.


> Did you ever try to use clcache (or others) ?
> 
> When I tried, it refused to cache because of our debug settings
> (DebugInformationFormat) - which seem to be enabled even in release mode.

> I wonder if that'll be an issue for ccache, too.  I think that line may need 
> to
> be conditional on debug mode.

That's relatively easily solvable by using a different debug format IIRC (/Z7
or such).

Greetings,

Andres Freund




Re: Adding CI to our tree (ccache)

2022-02-20 Thread Justin Pryzby
Have you tried to use the yet-to-be-released ccache with MSVC ?

Also, do you know about msbuild /outputResultsCache ?
When I tried that, it gave a bunch of error.

https://cirrus-ci.com/task/5697497241747456

|[16:35:13.605]  1>c:\cirrus\pgsql.sln.metaproj : error : MSB4252: Project 
"c:\cirrus\pgsql.sln" with global properties [c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : 
(TrackFileAccess=false; CLToolExe=clcache.exe) [c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : is building project 
"c:\cirrus\initdb.vcxproj" with global properties [c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : 
(TrackFileAccess=false; CLToolExe=clcache.exe; BuildingSolutionFile=true; 
CurrentSolutionConfigurationContents= 
[c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error :   Debug|x64 
[c:\cirrus\pgsql.sln]
|...
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error :   
Debug|x64 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : 
; SolutionDir=c:\cirrus\; SolutionExt=.sln; 
SolutionFileName=pgsql.sln; SolutionName=pgsql; 
SolutionPath=c:\cirrus\pgsql.sln; Configuration=Debug; Platform=x64) 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : with the 
(default) target(s) but the build result for the built project is not in the 
engine cache. In isolated builds this could mean one of the following: 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : - the 
reference was called with a target which is not specified in the 
ProjectReferenceTargets item in project "c:\cirrus\pgsql.sln" 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : - the 
reference was called with global properties that do not match the static graph 
inferred nodes [c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : - the 
reference was not explicitly specified as a ProjectReference item in project 
"c:\cirrus\pgsql.sln" [c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error :  
[c:\cirrus\pgsql.sln]
|[16:35:14.518] 
|[16:35:14.518] 0 Warning(s)
|[16:35:14.518] 149 Error(s)

Did you ever try to use clcache (or others) ?

When I tried, it refused to cache because of our debug settings
(DebugInformationFormat) - which seem to be enabled even in release mode.

I wonder if that'll be an issue for ccache, too.  I think that line may need to
be conditional on debug mode.

https://cirrus-ci.com/task/4808554103177216

|[17:14:28.765]   C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py 
Expanded commandline '['/c', '/Isrc/include', '/Isrc/include/port/win32', 
'/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi', 
'/nologo', '/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', 
'_WINDOWS', '/D', '__WINDOWS__', '/D', '__WIN32__', '/D', 
'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', 
'_CRT_NONSTDC_NO_DEPRECATE', '/D', 'FRONTEND', '/D', '_MBCS', '/GF', '/Gm-', 
'/EHsc', '/MD', '/GS', '/fp:precise', '/Zc:wchar_t', '/Zc:forScope', 
'/Zc:inline', '/Fo.\\Release\\libpgcommon\\', 
'/Fd.\\Release\\libpgcommon\\libpgcommon.pdb', '/external:W3', '/Gd', '/TC', 
'/wd4018', '/wd4244', '/wd4273', '/wd4101', '/wd4102', '/wd4090', '/wd4267', 
'/FC', '/errorReport:queue', '/MP', 'src/common/archive.c', 
'src/common/base64.c', 'src/common/checksum_helper.c', 
'src/common/config_info.c', 'src/common/controldata_utils.c', 
'src/common/cryptohash_openssl.c', 'src/common/d2s.c', 'src/common/encnames.c', 
'src/common/exec.c', 'src/common/f2s.c', 'src/common/fe_memutils.c', 
'src/common/file_perm.c', 'src/common/file_utils.c', 'src/common/hashfn.c', 
'src/common/hmac_openssl.c', 'src/common/ip.c', 'src/common/jsonapi.c', 
'src/common/keywords.c', 'src/common/kwlookup.c', 'src/common/link-canary.c', 
'src/common/logging.c', 'src/common/md5_common.c', 'src/common/pg_get_line.c', 
'src/common/pg_lzcompress.c', 'src/common/pg_prng.c', 'src/common/pgfnames.c', 
'src/common/protocol_openssl.c', 'src/common/psprintf.c', 
'src/common/relpath.c', 'src/common/restricted_token.c', 'src/common/rmtree.c', 
'src/common/saslprep.c', 'src/common/scram-common.c', 'src/common/sprompt.c', 
'src/common/string.c', 'src/common/stringinfo.c', 'src/common/unicode_norm.c', 
'src/common/username.c', 'src/common/wait_error.c', 'src/common/wchar.c']'
|[17:14:28.765]   C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py 
Cannot cache invocation as ['/c', '/Isrc/include', '/Isrc/include/port/win32', 
'/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi', 
'/nologo', '/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', 
'_WINDOWS', '/D', '__WINDOWS__', '/D', '__WIN32__', '/D', 
'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', 
'_CRT_NONSTDC_NO_DEPRECATE', '/D', 

Re: Adding CI to our tree

2022-02-16 Thread Peter Eisentraut

On 13.02.22 09:30, Andres Freund wrote:

BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4.
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet.xsl 
postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '15devel'  
stylesheet-man.xsl postgres.sgml

Sure, it just doesn't make a difference:

make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/
real0m34.626s
user0m34.342s
sys 0m0.298s

make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/

real0m34.780s
user0m34.494s
sys 0m0.285s


Note that the default target in doc/src/sgml/ is "html", not "all".  If 
you build "all", you build "html" plus "man", which can be run in 
parallel.  (It's only two jobs, of course.)  If you're more ambitious, 
you could also run the PDF builds.






Re: Adding CI to our tree

2022-02-15 Thread Andres Freund
Hi,

On February 15, 2022 10:12:36 PM PST, Justin Pryzby  
wrote:
>On Sun, Feb 13, 2022 at 01:53:19PM -0800, Andres Freund wrote:
>> Hi,
>> 
>> On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote:
>> > Oh - I suppose you're right.  That's an unfortunate consequence of running 
>> > a
>> > single prove instance without chdir.
>> 
>> I don't think it's chdir that's relevant (that changes into the source
>> directory after all). It's the TESTDIR environment variable.
>> 
>> I was thinking that we should make Utils.pm's INIT block responsible for
>> figuring out both the directory a test should run in and the log location,
>> instead having that in vcregress.pl and Makefile.global.in. Mostly because
>> doing it in the latter means we can't start tests with different TESTDIR and
>> working dir at the same time.
>> 
>> If instead we pass the location of the top-level build and top-level source
>> directory from vcregress.pl / Makefile.global, the tap test infrastructure 
>> can
>> figure out that stuff themselves, on a per-test basis.
>> 
>> For msvc builds we probably would need to pass in some information that allow
>> Utils.pm to set up PATH appropriately. I think that might just require 
>> knowing
>> that a) msvc build system is used b) Release vs Debug.
>
>I'm totally unsure if this resembles what you're thinking of, and I'm surprised
>I got it working so easily.  But it gets the tap test output in separate dirs,
>and CI is passing for everyone (windows failed because I injected a "false" to
>force it to upload artifacts).
>
>https://github.com/justinpryzby/postgres/runs/5211673291

Yes, that's along the lines I was thinking. I only checked it on my phone, so 
it certainly isn't a careful look...

I think this should be discussed in a separate thread, for visibility.

FWIW, I'd like to additionally add marker files in INIT and remove them in END. 
And create files signaling success and failure in END. That would allow 
automated selection of log files of failed tests...

Andres

Regards,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Adding CI to our tree

2022-02-15 Thread Justin Pryzby
On Sun, Feb 13, 2022 at 01:53:19PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote:
> > Oh - I suppose you're right.  That's an unfortunate consequence of running a
> > single prove instance without chdir.
> 
> I don't think it's chdir that's relevant (that changes into the source
> directory after all). It's the TESTDIR environment variable.
> 
> I was thinking that we should make Utils.pm's INIT block responsible for
> figuring out both the directory a test should run in and the log location,
> instead having that in vcregress.pl and Makefile.global.in. Mostly because
> doing it in the latter means we can't start tests with different TESTDIR and
> working dir at the same time.
> 
> If instead we pass the location of the top-level build and top-level source
> directory from vcregress.pl / Makefile.global, the tap test infrastructure can
> figure out that stuff themselves, on a per-test basis.
> 
> For msvc builds we probably would need to pass in some information that allow
> Utils.pm to set up PATH appropriately. I think that might just require knowing
> that a) msvc build system is used b) Release vs Debug.

I'm totally unsure if this resembles what you're thinking of, and I'm surprised
I got it working so easily.  But it gets the tap test output in separate dirs,
and CI is passing for everyone (windows failed because I injected a "false" to
force it to upload artifacts).

https://github.com/justinpryzby/postgres/runs/5211673291

commit 899e562102dd7a663cb087cdf88f0f78f8302492
Author: Justin Pryzby 
Date:   Tue Feb 15 20:02:36 2022 -0600

wip: set TESTDIR from src/test/perl rather than Makefile/vcregress

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 05c54b27def..1e49d8c8c37 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -450,7 +450,7 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' \
PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
$(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
@@ -460,7 +460,7 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
$(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
@@ -471,7 +471,7 @@ define prove_check
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
+   $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
$(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
diff --git a/src/bin/psql/t/010_tab_completion.pl 
b/src/bin/psql/t/010_tab_completion.pl
index 005961f34d4..a86dc78a365 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -70,7 +70,7 @@ delete $ENV{LS_COLORS};
 # to run in the build directory so that we can use relative paths to
 # access the tmp_check subdirectory; otherwise the output from filename
 # completion tests is too variable.
-if ($ENV{TESTDIR})
+if ($ENV{TESTDIR} && 0)
 {
chdir $ENV{TESTDIR} or die "could not chdir to \"$ENV{TESTDIR}\": $!";
 }
diff --git a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl 
b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
index facfec5cad4..2a0eca77440 100644
--- a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
+++ b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
@@ -49,9 +49,7 @@ for my $testname (@tests)
my $expected;
my $result;
 
-   # Hack to allow TESTDIR=. during parallel tap tests
-   my $inputdir = 
"$ENV{'TESTDIR'}/src/test/modules/libpq_pipeline";
-   $inputdir = "$ENV{'TESTDIR'}" if ! -e $inputdir;
+   my $inputdir = "$ENV{'TESTDIR'}/tmp_check";
$expected = slurp_file_eval("$inputdir/traces/$testname.trace");
next unless $expected ne "";
$result = slurp_file_eval($traceout);
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm 
b/src/test/perl/PostgreSQL/Test/Utils.pm
index 57fcb240898..5429de41ed5 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -184,19 +184,21 @@ INIT
# test may still fail, but it's more likely to report useful facts.
$SIG{PIPE} = 'IGNORE';
 
-   # Determine output directories, and create them.  The base 

Re: Adding CI to our tree

2022-02-13 Thread Andres Freund
Hi,

On 2022-02-13 15:42:13 -0600, Justin Pryzby wrote:
> > Note that prove unfortunately serializes the test output to be in the order 
> > it
> > started them, even when tests run concurrently. Extremely unhelpful, but ...
> 
> Are you sure ?

Somewhat. I think it's a question of the prove version and some autodetection
of what type of environment prove is running in (stdin/stdout/stderr). I don't
remember the details, but at some point I pinpointed the source of the
serialization, and verified that parallelization makes a significant
difference in runtime even without being easily visible :(.  But this is all
vague memory, so I might be wrong.

Reminds me that somebody (ugh, me???) should fix the perl > 5.26
incompatibilities on windows, then we'd also get a newer prove...



> > One nice bit is that the output is a *lot* easier to read.
> > 
> > You could try improving the total time by having prove remember slow tests 
> > and
> > use that file to run the slowest tests first next time. --state slow,save or
> > such I believe. Of course we'd have to save that state file...
> 
> In a test, this hurt rather than helped (13m 42s).
> https://cirrus-ci.com/task/6359167186239488
> 
> I'm not surprised - it makes sense to run 10 fast tests at once, but usually
> doesn't make sense to run 10 slow tests tests at once (at least a couple of
> which are doing something intensive).  It was faster (12m16s) to do it
> backwards (fastest tests first).
> https://cirrus-ci.com/task/5745115443494912

Hm.

I know I saw significant reduction in test times locally with meson by
starting slow tests earlier, because they're the limiting factor for the
*overall* test runtime - but I have more resources than on cirrus.  Even
locally on a windows VM, with the current buildsystem, I found that moving 027
to earlier withing recoverycheck reduced the test time.

But it's possible that with all tests being scheduled concurrently, starting
the slow tests early leads to sufficient resource overcommit to be
problematic.


> BTW, does it make sense to remove test_regress_parallel_script ?  The
> pg_upgrade run would do the same things, no ?  If so, it might make sense to
> run that first.  OTOH, you suggested to run the upgrade tests with checksums
> enabled, which seems like a good idea.

No, I don't think so. The main regression tests are by far the most important
thing during normal development. Just relying on main regression test runs
embedded in other tests, with different output and config of the main
regression test imo is just confusing.


Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-13 Thread Andres Freund
Hi,

On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote:
> Oh - I suppose you're right.  That's an unfortunate consequence of running a
> single prove instance without chdir.

I don't think it's chdir that's relevant (that changes into the source
directory after all). It's the TESTDIR environment variable.

I was thinking that we should make Utils.pm's INIT block responsible for
figuring out both the directory a test should run in and the log location,
instead having that in vcregress.pl and Makefile.global.in. Mostly because
doing it in the latter means we can't start tests with different TESTDIR and
working dir at the same time.

If instead we pass the location of the top-level build and top-level source
directory from vcregress.pl / Makefile.global, the tap test infrastructure can
figure out that stuff themselves, on a per-test basis.

For msvc builds we probably would need to pass in some information that allow
Utils.pm to set up PATH appropriately. I think that might just require knowing
that a) msvc build system is used b) Release vs Debug.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-13 Thread Justin Pryzby
On Sat, Feb 12, 2022 at 02:26:25PM -0800, Andres Freund wrote:
> On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote:
> > I had some success with that, but it doesn't seem to be significantly 
> > faster -
> > it looks a lot like the tests are not actually running in parallel.

Note that the total test time is close to the sum of the individual test times.
But I think that may be an artifact of how prove is showing/attributing times
to each test (which, if so, is misleading).

> Note that prove unfortunately serializes the test output to be in the order it
> started them, even when tests run concurrently. Extremely unhelpful, but ...

Are you sure ?  When I run it locally, I see:
rm -fr src/test/recovery/tmp_check ; time PERL5LIB=`pwd`/src/test/perl 
TESTDIR=`pwd`/src/test/recovery 
PATH=`pwd`/tmp_install/usr/local/pgsql/bin:$PATH 
PG_REGRESS=`pwd`/src/test/regress/pg_regress 
REGRESS_SHLIB=`pwd`/src/test/regress/regress.so prove --time -j4 --ext '*.pl' 
`find src -name t`
...
[15:34:48] src/bin/scripts/t/101_vacuumdb_all.pl ... ok 
 104 ms ( 0.00 usr  0.00 sys +  2.35 cusr  0.47 csys =  2.82 CPU)
[15:34:49] src/bin/scripts/t/090_reindexdb.pl .. ok 
8894 ms ( 0.06 usr  0.01 sys + 14.45 cusr  3.38 csys = 17.90 CPU)
[15:34:50] src/bin/pg_config/t/001_pg_config.pl  ok 
  79 ms ( 0.00 usr  0.01 sys +  0.23 cusr  0.04 csys =  0.28 CPU)
[15:34:50] src/bin/pg_waldump/t/001_basic.pl ... ok 
  35 ms ( 0.00 usr  0.00 sys +  0.26 cusr  0.02 csys =  0.28 CPU)
[15:34:51] src/bin/pg_test_fsync/t/001_basic.pl  ok 
 100 ms ( 0.01 usr  0.00 sys +  0.24 cusr  0.04 csys =  0.29 CPU)
[15:34:51] src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl  ok 
 177 ms ( 0.02 usr  0.00 sys +  0.26 cusr  0.03 csys =  0.31 CPU)
[15:34:55] src/bin/scripts/t/100_vacuumdb.pl ... ok
11267 ms ( 0.12 usr  0.04 sys + 13.47 cusr  3.20 csys = 16.83 CPU)
[15:34:57] src/bin/scripts/t/102_vacuumdb_stages.pl  ok 
5802 ms ( 0.06 usr  0.01 sys +  7.70 cusr  1.37 csys =  9.14 CPU)
...

=> scripts/ stuff, followed by other stuff, followed by more, slower, scripts/ 
stuff.

But I never saw that in cirrus.

> Isn't this kind of a good test time? I thought earlier your alltaptests target
> took a good bit longer?

The original alltaptests runs in 16m 21s.
https://cirrus-ci.com/task/6679061752709120

2 weeks ago, it was ~14min with your patch to cache initdb.
https://cirrus-ci.com/task/5439320633901056

As I recall, that didn't seem to improve runtime when combined with my parallel
patch.

> One nice bit is that the output is a *lot* easier to read.
> 
> You could try improving the total time by having prove remember slow tests and
> use that file to run the slowest tests first next time. --state slow,save or
> such I believe. Of course we'd have to save that state file...

In a test, this hurt rather than helped (13m 42s).
https://cirrus-ci.com/task/6359167186239488

I'm not surprised - it makes sense to run 10 fast tests at once, but usually
doesn't make sense to run 10 slow tests tests at once (at least a couple of
which are doing something intensive).  It was faster (12m16s) to do it
backwards (fastest tests first).
https://cirrus-ci.com/task/5745115443494912

BTW, does it make sense to remove test_regress_parallel_script ?  The
pg_upgrade run would do the same things, no ?  If so, it might make sense to
run that first.  OTOH, you suggested to run the upgrade tests with checksums
enabled, which seems like a good idea.

Note that in the attached patches, I changed the msvc "warnings" to use "tee".

I don't know how to fix the pipeline test in a less hacky way...

You said the docs build should be a separate task, but then said that it'd be
okay to remove the dependency.  So I did it both ways.  There's currently some
duplication between the docs patch and code coverage patch.

-- 
Justin
>From 05c24a1e679e5ae0dd0cb6504d397f77c8b1fc5c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/8] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a features is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index dd96a97efe5..eda8ac9596c 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image 

Re: Adding CI to our tree

2022-02-13 Thread Justin Pryzby
On Sun, Feb 13, 2022 at 01:23:16PM -0800, Andres Freund wrote:
> If you're seeing this on windows on one of your test branches, that's much
> more likely to be caused by the alltaptests stuff, than by the change in
> artifact instruction.

Oh - I suppose you're right.  That's an unfortunate consequence of running a
single prove instance without chdir.

-- 
Justin




Re: Adding CI to our tree

2022-02-13 Thread Andres Freund
Hi,

On 2022-02-13 15:02:50 -0600, Justin Pryzby wrote:
> On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote:
> > > What I am excited about is that some of your other changes showed that we
> > > don't need separate *_artifacts for separate directories anymore. That 
> > > used to
> > > be the case, but an array of paths is now supported. Putting log, diffs, 
> > > and
> > > regress_log in one directory will be considerably more convenient...
> >
> > pushed together.
>
> This change actually complicates things.
>
> Before, there was log/src/test/recovery/tmp_check/log, with a few files for
> 001, a few for 002, a few for 003.  This are a lot of output files, but at
> least they're all related.

> Now, there's a single log/tmp_check/log, which has logs for the entire tap
> tests.  There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc.

Hm? Doesn't look like that to me, and I don't see why it would work that way?
This didn't do anything to flatten the directory hierarchy, just combine three
hierarchies into one?

What I see, and what I expect, is that logs end up in
e.g. log/src/test/recovery/tmp_check/log but that that directory contains
regress_log_*, as well as *.log?  Before one needed to go through the
hierarchy multiple times to see both regress_log_ (i.e. tap test log) as well
as 0*.log (i.e. server logs).

A random example: https://cirrus-ci.com/task/5152523873943552 shows the logs
for the failure in log/src/bin/pg_upgrade/tmp_check/log


If you're seeing this on windows on one of your test branches, that's much
more likely to be caused by the alltaptests stuff, than by the change in
artifact instruction.


Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-13 Thread Justin Pryzby
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote:
> > What I am excited about is that some of your other changes showed that we
> > don't need separate *_artifacts for separate directories anymore. That used 
> > to
> > be the case, but an array of paths is now supported. Putting log, diffs, and
> > regress_log in one directory will be considerably more convenient...
> 
> pushed together.

This change actually complicates things.

Before, there was log/src/test/recovery/tmp_check/log, with a few files for
001, a few for 002, a few for 003.  This are a lot of output files, but at
least they're all related.

Now, there's a single log/tmp_check/log, which has logs for the entire tap
tests.  There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc.  

-- 
Justin




Re: Adding CI to our tree

2022-02-13 Thread Andres Freund
Hi,

On 2022-02-13 15:09:20 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-02-13 12:13:17 -0500, Tom Lane wrote:
> >> Right.  Can we set things up so that it's not too painful to inject
> >> custom build options into a CI build?
> 
> > What kind of injection are you thinking about?
> 
> That's exactly what needs to be decided.
> [...]
> I'd prefer something a bit more out-of-band.  I don't know this technology
> well enough to propose a way.

I don't yet understand the precise use case for adjustments well enough to
propose something. Who would like to adjust something for what purpose?  The
"original" author, for a one-off test? A reviewer / committer, to track down a
hunch?

If it's about CI runs in in a personal repository, one can set additional
environment vars from the CI management interface. We can make sure they work
(the ci stuff probably overrides CFLAGS, but COPT should work) and document
the way to do so.


> > A patch author can obviously
> > just add options in .cirrus.yml. That's something possible now, that was not
> > possible with cfbot applying its own .cirrus.yml
> 
> Fine, but are committers supposed to keep track of the fact that they
> shouldn't commit that part of a patch?

I'd say it depends on the the specific modification - there's some patches
where it seems to make sense to adjust extend CI as part of it and have it
merged. But yea, in other cases committers would have to take them out.


For more on-off stuff one would presumably not want to spam the list the list
with a full patchset to trigger a cfbot run, nor wait till cfbot gets round to
that patch again. When pushing to a personal repo it's of course easy to just
have a commit on-top of what's submitted to play around with compile options.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-13 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-13 12:13:17 -0500, Tom Lane wrote:
>> Right.  Can we set things up so that it's not too painful to inject
>> custom build options into a CI build?

> What kind of injection are you thinking about?

That's exactly what needs to be decided.

> A patch author can obviously
> just add options in .cirrus.yml. That's something possible now, that was not
> possible with cfbot applying its own .cirrus.yml

Fine, but are committers supposed to keep track of the fact that they
shouldn't commit that part of a patch?  I'd prefer something a bit more
out-of-band.  I don't know this technology well enough to propose a way.

> I'd like to have things like -fanitize=aligned and
> -DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's
> benefit. Most patch authors won't know about using
> -DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling
> them. We're *really* not doing well on the "timely review" side of things, so
> we at least should not waste time on high latency back-forth for easily
> automatically detectable things.

I don't personally have an objection to either of those; maybe Robert
does.

regards, tom lane




Re: Adding CI to our tree

2022-02-13 Thread Andres Freund
Hi,

On 2022-02-13 12:13:17 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > This is exactly why I'm not a huge fan of having ci stuff in the tree.
> > It supposes that there's one right way to do a build, but in reality,
> > different people want and indeed need to use different options for all
> > kinds of reasons.

Sure. But why is that an argument against "having ci stuff in the tree"?

All it does is to make sure that a certain base-level of testing is easy to
achieve for everyone. I don't like working on windows or mac, but my patches
often have platform dependent bits. Now it's less likely that I need to
manually interact with windows.

I don't think we can (or well should) replace the buildfarm with the CI
stuff. The buildfarm provides extensive and varied coverage for master/release
branches. Which isn't feasible for unmerged development work, including cfbot,
from a resource usage POV alone.


> > That's the whole value of having things like
> > configure and pg_config_manual.h. When we start arguing about whether
> > or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably
> > into the realm where no choice is objectively better,

> Right.  Can we set things up so that it's not too painful to inject
> custom build options into a CI build?

What kind of injection are you thinking about? A patch author can obviously
just add options in .cirrus.yml. That's something possible now, that was not
possible with cfbot applying its own .cirrus.yml

It'd be nice if there were a way to do it more easily for msvc and configure
builds together, right now it'd require modifying those tasks in different
ways. But that's not really a CI question.


I'd like to have things like -fanitize=aligned and
-DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's
benefit. Most patch authors won't know about using
-DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling
them. We're *really* not doing well on the "timely review" side of things, so
we at least should not waste time on high latency back-forth for easily
automatically detectable things.


> I should think that at the very least one needs to be able to vary the
> configure switches and CPPFLAGS/CFLAGS.

Do you mean as part of a patch tested with cfbot, CI running for pushes to
your own repository, or ...?

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-13 Thread Tom Lane
Robert Haas  writes:
> This is exactly why I'm not a huge fan of having ci stuff in the tree.
> It supposes that there's one right way to do a build, but in reality,
> different people want and indeed need to use different options for all
> kinds of reasons. That's the whole value of having things like
> configure and pg_config_manual.h. When we start arguing about whether
> or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably
> into the realm where no choice is objectively better,

Right.  Can we set things up so that it's not too painful to inject
custom build options into a CI build?  I should think that at the
very least one needs to be able to vary the configure switches and
CPPFLAGS/CFLAGS.

regards, tom lane




Re: Adding CI to our tree

2022-02-13 Thread Robert Haas
On Sun, Feb 13, 2022 at 3:30 AM Andres Freund  wrote:
> > There's other things that it'd be nice to enable, but maybe these don't 
> > need to
> > be on by default.  Maybe just have a list of optional compiler flags (and 
> > hints
> > when they're useful).  Like WRITE_READ_PARSE_PLAN_TREES.
>
> I think it'd be good to enable a reasonable set by default. Particularly for
> newer contributors stuff like forgetting in/out/readfuncs, or not knowing
> about some undefined behaviour, is easy. Probably makes sense to use different
> settings on different tasks.

This is exactly why I'm not a huge fan of having ci stuff in the tree.
It supposes that there's one right way to do a build, but in reality,
different people want and indeed need to use different options for all
kinds of reasons. That's the whole value of having things like
configure and pg_config_manual.h. When we start arguing about whether
or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably
into the realm where no choice is objectively better, and whoever
yells the loudest will get it the way they want, and then somebody
else later will say "well that's dumb I don't like that" or even just
"well that's not the right thing for testing MY patch," at which point
the previous mailing list discussion will be cited as "precedent" for
what was essentially an arbitrary decision made by 1 or 2 people.

Mind you, I'm not trying to hold back the tide. I realize that in-tree
ci stuff is very much in vogue. But I think it would be a very healthy
thing if we acknowledged that what goes in there is far more arbitrary
than most of what we put in the tree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Adding CI to our tree

2022-02-13 Thread Andres Freund
Hi,

On 2022-02-12 23:19:38 -0600, Justin Pryzby wrote:
> On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote:
> > What was the reason behind moving the docs stuff from the compiler warnings
> > task to linux?
> 
> I wanted to build docs even if the linux task fails.  To allow CFBOT to link 
> to
> them, so somoene can always review the docs, in HTML (rather than reading SGML
> with lines prefixed with +).

I'd be ok with running the compiler warnings job without the dependency, if
that's the connection.


> BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4.
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> /usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet.xsl 
> postgres.sgml
> /usr/bin/xsltproc --path . --stringparam pg.version '15devel'  
> stylesheet-man.xsl postgres.sgml

Sure, it just doesn't make a difference:

make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/
real0m34.626s
user0m34.342s
sys 0m0.298s

make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/

real0m34.780s
user0m34.494s
sys 0m0.285s



> > 1) iterate over release branches and see which has the smallest diff
> 
> Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; 
> done |sort -n |head -1
> 
> > > > Is looking at the .c files in the change really a reliable predictor of 
> > > > where
> > > > code coverage changes? I'm doubtful. Consider stuff like removing the 
> > > > last
> > > > user of some infrastructure or such. Or adding the first.
> > >
> > > You're right that it isn't particularly accurate, but it's a step forward 
> > > if
> > > lots of patches use it to check/improve coverge of new code.
> > 
> > Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m
> > -> ~7.15m), but probably acceptable. Although I also would like to enable
> > -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as
> > well (-fsanitize=address is a lot more expensive), they both work best on
> > linux.
> 
> There's other things that it'd be nice to enable, but maybe these don't need 
> to
> be on by default.  Maybe just have a list of optional compiler flags (and 
> hints
> when they're useful).  Like WRITE_READ_PARSE_PLAN_TREES.

I think it'd be good to enable a reasonable set by default. Particularly for
newer contributors stuff like forgetting in/out/readfuncs, or not knowing
about some undefined behaviour, is easy. Probably makes sense to use different
settings on different tasks.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-12 Thread Justin Pryzby
On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote:
> On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > > I'm a bit worried about the increased storage and runtime overhead due to 
> > > the
> > > docs changes. We probably can make it a good bit cheaper though.
> >
> > If you mean overhead of additional disk operations, it shouldn't be an 
> > issue,
> > since the git clone uses shared references (not even hardlinks).
> 
> I was concerned about those and just the increased runtime of the script. Our
> sources are 130MB, leaving the shared .git aside. But maybe it's just fine.
> 
> We probably can just get rid of the separate clone and configure run though?
> Build the docs, copy the output, do a git co parent docs/, build again?

Yes - works great, thanks.

> What was the reason behind moving the docs stuff from the compiler warnings
> task to linux?

I wanted to build docs even if the linux task fails.  To allow CFBOT to link to
them, so somoene can always review the docs, in HTML (rather than reading SGML
with lines prefixed with +).

> Not that either fits very well... I think it might be worth
> moving the docs stuff into its own task, using a 1 CPU container (docs build
> isn't parallel anyway). Think that'll be easier to see in the cfbot page. I

Yeah.  The only drawback is the duplication (including the "git parent" stuff).

BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4.
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '15devel'  stylesheet.xsl 
postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '15devel'  
stylesheet-man.xsl postgres.sgml

> 1) iterate over release branches and see which has the smallest diff

Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; done 
|sort -n |head -1

> > > Is looking at the .c files in the change really a reliable predictor of 
> > > where
> > > code coverage changes? I'm doubtful. Consider stuff like removing the last
> > > user of some infrastructure or such. Or adding the first.
> >
> > You're right that it isn't particularly accurate, but it's a step forward if
> > lots of patches use it to check/improve coverge of new code.
> 
> Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m
> -> ~7.15m), but probably acceptable. Although I also would like to enable
> -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as
> well (-fsanitize=address is a lot more expensive), they both work best on
> linux.

There's other things that it'd be nice to enable, but maybe these don't need to
be on by default.  Maybe just have a list of optional compiler flags (and hints
when they're useful).  Like WRITE_READ_PARSE_PLAN_TREES.

> > In addition to the HTML generated for changed .c files, it's possible to 
> > create
> > a coverage.gcov output for everything, which could be retrieved to generate
> > full HTML locally.  It's ~8MB (or 2MB after gzip).
> 
> Note sure that doing doing it locally adds much over just running tests
> locally.

You're right, since one needs to have the patched source files to generate the
HTML.

On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote:
> I think it may be a bit cleaner to have the "install additional packages"
> "template step" be just that, and not merge in other contents into it?

I renamed the "cores" task since it consistently makes me think you're doing
with CPU cores.  It took it as an opportunity to generalize the task.

These patches are ready for review.  I'll plan to mail about TAP stuff
tomorrow.
>From 05c24a1e679e5ae0dd0cb6504d397f77c8b1fc5c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/3] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a features is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index dd96a97efe5..eda8ac9596c 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
   

Re: Adding CI to our tree

2022-02-12 Thread Andres Freund
On 2022-02-12 20:47:04 -0600, Justin Pryzby wrote:
> While rebasing, I noticed an error.
> 
> You wrote **/.diffs, but should be **/*.diffs

Embarassing. Thanks for noticing! Pushed the fix...




Re: Adding CI to our tree

2022-02-12 Thread Justin Pryzby
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote:
> > > e5286ede1b4 cirrus: avoid unnecessary double star **
> > 
> > Can't get excited about this, but whatever.
> > 
> > What I am excited about is that some of your other changes showed that we
> > don't need separate *_artifacts for separate directories anymore. That used 
> > to
> > be the case, but an array of paths is now supported. Putting log, diffs, and
> > regress_log in one directory will be considerably more convenient...
> 
> pushed together.

While rebasing, I noticed an error.

You wrote **/.diffs, but should be **/*.diffs

--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -30,15 +30,11 @@ env:
 # What files to preserve in case tests fail
 on_failure: _failure
   log_artifacts:
-path: "**/**.log"
+paths:
+  - "**/*.log"
+  - "**/.diffs"
+  - "**/regress_log_*"
 type: text/plain
-  regress_diffs_artifacts:
-path: "**/**.diffs"
-type: text/plain
-  tap_artifacts:
-path: "**/regress_log_*"
-type: text/plain
-




Re: Adding CI to our tree

2022-02-12 Thread Andres Freund
Hi,

Alvaro adding you because the "branch" discussion in the MERGE thread.


On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > I'm a bit worried about the increased storage and runtime overhead due to 
> > the
> > docs changes. We probably can make it a good bit cheaper though.
>
> If you mean overhead of additional disk operations, it shouldn't be an issue,
> since the git clone uses shared references (not even hardlinks).

I was concerned about those and just the increased runtime of the script. Our
sources are 130MB, leaving the shared .git aside. But maybe it's just fine.

We probably can just get rid of the separate clone and configure run though?
Build the docs, copy the output, do a git co parent docs/, build again?


What was the reason behind moving the docs stuff from the compiler warnings
task to linux? Not that either fits very well... I think it might be worth
moving the docs stuff into its own task, using a 1 CPU container (docs build
isn't parallel anyway). Think that'll be easier to see in the cfbot page. I
think it's also good to run it independent of the linux task succeeding - a
docs failure seems like a separate enough issue.


> > > 9d0f03d3450 wip: cirrus: code coverage
> >
> > I don't think it's good to just unconditionally reference the master branch
> > here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
> > not other uses.
>
> That's only used for filtering changed files.  It uses git diff --cherry-pick
> postgres/master..., which would *try* to avoid showing anything which is also
> in master.

You commented in another email on this:

On 2022-02-11 12:51:50 -0600, Justin Pryzby wrote:
> Because I put your patch on top of some other branch with the CI coverage (and
> other stuff).
>
> It tries to only show files changed by the branch being checked:
> https://github.com/justinpryzby/postgres/commit/d668142040031915
>
> But it has to figure out where the branch "starts".  Which I did by looking at
> "git diff --cherry-pick origin..."
>
> Andres thinks that does the wrong thing if CI is run manually (not by CFBOT)
> for patches against backbranches.  I'm not sure git diff --cherry-pick is
> widely known/used, but I think using that relative to master may be good
> enough.

Note that I'm not concerned about "manually" running CI against other branches
- I'm concerned about the point where where 15 branches off and CI will
automatically also run against 15. E.g. in the postgres repo
https://cirrus-ci.com/github/postgres/postgres/

I can see a few ways to deal with this:
1) iterate over release branches and see which has the smallest diff
2) parse the branch name, if it's a cfbot run, we know it's master, otherwise 
skip
3) change cfbot so that it maintains things as pull requests, which have a
   base branch


> > Is looking at the .c files in the change really a reliable predictor of 
> > where
> > code coverage changes? I'm doubtful. Consider stuff like removing the last
> > user of some infrastructure or such. Or adding the first.
>
> You're right that it isn't particularly accurate, but it's a step forward if
> lots of patches use it to check/improve coverge of new code.

Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m
-> ~7.15m), but probably acceptable. Although I also would like to enable
-fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as
well (-fsanitize=address is a lot more expensive), they both work best on
linux.


> In addition to the HTML generated for changed .c files, it's possible to 
> create
> a coverage.gcov output for everything, which could be retrieved to generate
> full HTML locally.  It's ~8MB (or 2MB after gzip).

Note sure that doing doing it locally adds much over just running tests
locally.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-12 Thread Andres Freund
Hi,

On 2022-02-03 11:57:18 -0800, Andres Freund wrote:
> > 95793675633 cirrus: spell ccache_maxsize
> 
> Yep, will apply with a bunch of your other changes, if you answer a question
> or two...

Pushed.


> > e5286ede1b4 cirrus: avoid unnecessary double star **
> 
> Can't get excited about this, but whatever.
> 
> What I am excited about is that some of your other changes showed that we
> don't need separate *_artifacts for separate directories anymore. That used to
> be the case, but an array of paths is now supported. Putting log, diffs, and
> regress_log in one directory will be considerably more convenient...

pushed together.


> > 398b7342349 cirrus/macos: uname -a and export at end of sysinfo
> 
> Shrug.

Pushed.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-12 Thread Andres Freund
Hi,

On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote:
> I had some success with that, but it doesn't seem to be significantly faster -
> it looks a lot like the tests are not actually running in parallel.

Note that prove unfortunately serializes the test output to be in the order it
started them, even when tests run concurrently. Extremely unhelpful, but ...

Isn't this kind of a good test time? I thought earlier your alltaptests target
took a good bit longer?

One nice bit is that the output is a *lot* easier to read.


You could try improving the total time by having prove remember slow tests and
use that file to run the slowest tests first next time. --state slow,save or
such I believe. Of course we'd have to save that state file...

To verify that tests actually run concurrently you could emit a few
notes. Looks like those are output immediately...

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-12 Thread Justin Pryzby
On Tue, Jan 18, 2022 at 05:16:26PM -0800, Andres Freund wrote:
> On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote:
> > On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > > I think it might still be worth adding stopgap way of running all tap 
> > > tests on
> > > windows though. Having a vcregress.pl function to find all directories 
> > > with t/
> > > and run the tests there, shouldn't be a lot of code...
> > 
> > I started doing that, however it makes CI/windows even slower.
...
> > I think it'll be necessary to run prove with all the tap tests to
> > parallelize them, rather than looping around directories, many of which have
> > only a single file, and are run serially.
> 
> That's unfortunately not trivially possible. Quite a few tests currently rely
> on being called in a specific directory. We should fix this, but it's not a
> trivial amount of work.

On Sat, Feb 05, 2022 at 07:23:39PM -0800, Andres Freund wrote:
> On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > > I assume this doesn't yet work to a meaningful degree? Last time I checked
> > > there were quite a few tests that needed to be invoked in a specific
> > > directory.
> > 
> > It works - tap_check() does chdir().
> 
> Ah, I thought you'd implemented a target that does it all in one prove
> invocation...

I had some success with that, but it doesn't seem to be significantly faster -
it looks a lot like the tests are not actually running in parallel.  I tried
some variations like passing the list of dirs vs the list of files, and
--jobs=9 vs -j9, without success.

https://cirrus-ci.com/task/5580584675180544

https://github.com/justinpryzby/postgres/commit/a865adc5b8c
fc7b3ea8bce vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
03adb043d16 wip: vcsregress: add alltaptests
63bf0796ffd wip: vcregress: run alltaptests in parallel
9dc327f6b30 f!wip: vcregress: run alltaptests in a single prove invocation
a865adc5b8c tmp: run tap tests first

> > It currently fails in 027_stream_regress.pl, although I keep hoping that it
> > had been fixed...
> 
> That's likely because you're not setting REGRESS_OUTPUTDIR like
> src/test/recovery/Makefile and recoverycheck() are doing.

Yes, thanks.

-- 
Justin




Re: Adding CI to our tree

2022-02-05 Thread Andres Freund
Hi,

On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > I assume this doesn't yet work to a meaningful degree? Last time I checked
> > there were quite a few tests that needed to be invoked in a specific
> > directory.
> 
> It works - tap_check() does chdir().

Ah, I thought you'd implemented a target that does it all in one prove
invocation...


> It currently fails in 027_stream_regress.pl, although I keep hoping that it
> had been fixed...

That's likely because you're not setting REGRESS_OUTPUTDIR like
src/test/recovery/Makefile and recoverycheck() are doing.


Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-03 Thread Andres Freund
Hi, 

On February 3, 2022 9:04:04 PM PST, Justin Pryzby  wrote:
>On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote:
>> On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote:
>> > FYI: I've rebased these against your cirrus/windows changes.
>> 
>> What's the idea behind
>> #echo 'deb http://deb.debian.org/debian bullseye main' 
>> >>/etc/apt/sources.list
>> That's already in sources.list, so I'm not sure what this shows?
>
>At one point I thought I needed it - maybe all I needed was "apt-get update"..

Yes, that's needed. There's no old pre fetched package list, because it'd be 
outdated anyway, and work *sometimes* for some packages... They're also not 
small (image size influences job start speed heavily).


>> > 9d0f03d3450 wip: cirrus: code coverage
>>
>> I don't think it's good to just unconditionally reference the master branch
>> here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
>> not other uses.
>
>That's only used for filtering changed files.  It uses git diff --cherry-pick
>postgres/master..., which would *try* to avoid showing anything which is also
>in master.

The point is that master is not a relevant point of comparison when a commit in 
the 14 branch is tested.


Regards,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Adding CI to our tree

2022-02-03 Thread Justin Pryzby
On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote:
> On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote:
> > FYI: I've rebased these against your cirrus/windows changes.
> 
> Did you put then on a dedicated branch, or only intermixed with other changes?

Yes it's intermixed (because I have too many branches, and because in this case
it's useful to show the doc builds and coverage artifacts).

> > A recent cirrus result is here; this has other stuff in the branch, so you 
> > can
> > see the artifacts with HTML docs and code coverage.
> 
> I'm a bit worried about the increased storage and runtime overhead due to the
> docs changes. We probably can make it a good bit cheaper though.

If you mean overhead of additional disk operations, it shouldn't be an issue,
since the git clone uses shared references (not even hardlinks).

If you meant storage capacity, I'm only uploading the *changed* docs as
artifacts.  The motivation being that it's a lot more convenient to look though
a single .html, and not hundreds.

> What's the idea behind
> #echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list
> That's already in sources.list, so I'm not sure what this shows?

At one point I thought I needed it - maybe all I needed was "apt-get update"..

> > 9d0f03d3450 wip: cirrus: code coverage
>
> I don't think it's good to just unconditionally reference the master branch
> here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
> not other uses.

That's only used for filtering changed files.  It uses git diff --cherry-pick
postgres/master..., which would *try* to avoid showing anything which is also
in master.

> Is looking at the .c files in the change really a reliable predictor of where
> code coverage changes? I'm doubtful. Consider stuff like removing the last
> user of some infrastructure or such. Or adding the first.

You're right that it isn't particularly accurate, but it's a step forward if
lots of patches use it to check/improve coverge of new code.

In addition to the HTML generated for changed .c files, it's possible to create
a coverage.gcov output for everything, which could be retrieved to generate
full HTML locally.  It's ~8MB (or 2MB after gzip).

> > bff64e8b998 cirrus: upload html docs as artifacts
> > 80f52c3b172 wip: only upload changed docs
> 
> Similar to the above.

Do you mean it's not reliable ?  This is looking at which .html have changed
(not sgml).  Surely that's reliable ?

> > 654b6375401 wip: vcsregress: add alltaptests
> 
> I assume this doesn't yet work to a meaningful degree? Last time I checked
> there were quite a few tests that needed to be invoked in a specific
> directory.

It works - tap_check() does chdir().  But it's slow, and maybe should try to
implement a check-world target.  It currently fails in 027_stream_regress.pl,
although I keep hoping that it had been fixed...
https://cirrus-ci.com/task/6116235950686208

(BTW, I just realized that that commit should also remove the recoverycheck
call.)

-- 
Justin




Re: Adding CI to our tree

2022-02-03 Thread Andres Freund
Hi,

On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote:
> FYI: I've rebased these against your cirrus/windows changes.

Did you put then on a dedicated branch, or only intermixed with other changes?


> A recent cirrus result is here; this has other stuff in the branch, so you can
> see the artifacts with HTML docs and code coverage.

I'm a bit worried about the increased storage and runtime overhead due to the
docs changes. We probably can make it a good bit cheaper though.


> https://github.com/justinpryzby/postgres/runs/5046465342


> 95793675633 cirrus: spell ccache_maxsize

Yep, will apply with a bunch of your other changes, if you answer a question
or two...


> e5286ede1b4 cirrus: avoid unnecessary double star **

Can't get excited about this, but whatever.

What I am excited about is that some of your other changes showed that we
don't need separate *_artifacts for separate directories anymore. That used to
be the case, but an array of paths is now supported. Putting log, diffs, and
regress_log in one directory will be considerably more convenient...


> 03f6de4643e cirrus: include hints how to install OS packages..

What's the idea behind

#echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list

That's already in sources.list, so I'm not sure what this shows?


I think it may be a bit cleaner to have the "install additional packages"
"template step" be just that, and not merge in other contents into it?


> 39cc2130e76 cirrus/linux: script test.log..

I still don't understand what this commit is trying to achieve.


> 398b7342349 cirrus/macos: uname -a and export at end of sysinfo

Shrug.


> 9d0f03d3450 wip: cirrus: code coverage

I don't think it's good to just unconditionally reference the master branch
here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but
not other uses.

Perhaps we could have a cfbot special case (by checking for a repository
variable variable indicating the base branch) and show the incremental changes
to that branch? Or we could just check which branch has the smallest distance
in #commits?


If cfbot weren't a thing, I'd just make a code coverage / docs generation a
manual task (startable by a click in the UI). But that requires permission on
the repository...


Hm. I wonder if cfbot could maintain the code not as branches as such, but as
pull requests? Those include information about what the base branch is ;)


Is looking at the .c files in the change really a reliable predictor of where
code coverage changes? I'm doubtful. Consider stuff like removing the last
user of some infrastructure or such. Or adding the first.


> bff64e8b998 cirrus: upload html docs as artifacts
> 80f52c3b172 wip: only upload changed docs

Similar to the above.


> 7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode

I think that should be discussed on a different thread.


> 97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1

Probably also worth breaking out into a new thread.


> 654b6375401 wip: vcsregress: add alltaptests

I assume this doesn't yet work to a meaningful degree? Last time I checked
there were quite a few tests that needed to be invoked in a specific
directory.  In the meson branch I worked around that by having a wrapper
around the invocation of individual tap tests that changes CWD.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-02 Thread Justin Pryzby
On Tue, Jan 18, 2022 at 03:08:47PM -0600, Justin Pryzby wrote:
> On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > I think it might still be worth adding stopgap way of running all tap tests 
> > on
> > windows though. Having a vcregress.pl function to find all directories with 
> > t/
> > and run the tests there, shouldn't be a lot of code...
> 
> I started doing that, however it makes CI/windows even slower.  I think it'll
> be necessary to run prove with all the tap tests to parallelize them, rather
> than looping around directories, many of which have only a single file, and 
> are
> run serially.

FYI: I've rebased these against your cirrus/windows changes.

A recent cirrus result is here; this has other stuff in the branch, so you can
see the artifacts with HTML docs and code coverage.

https://github.com/justinpryzby/postgres/runs/5046465342

95793675633 cirrus: spell ccache_maxsize
e5286ede1b4 cirrus: avoid unnecessary double star **
03f6de4643e cirrus: include hints how to install OS packages..
39cc2130e76 cirrus/linux: script test.log..
398b7342349 cirrus/macos: uname -a and export at end of sysinfo
9d0f03d3450 wip: cirrus: code coverage
bff64e8b998 cirrus: upload html docs as artifacts
80f52c3b172 wip: only upload changed docs
7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode
ba229165fe1 wip: run pg_upgrade tests with data-checksums
97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
654b6375401 wip: vcsregress: add alltaptests

BTW, I think the double star added in f3feff825 probably doesn't need to be
double, either:

path: "crashlog-**.txt"

-- 
Justin




Re: pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)

2022-01-23 Thread Andres Freund
Hi,

On 2022-01-21 12:00:57 -0800, Andres Freund wrote:
> The attached patch adds no-sync handling to the manifest rename, as well as
> one case in the directory wal method.

Pushed that.

Greetings,

Andres Freund




pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)

2022-01-21 Thread Andres Freund
Hi,

On 2022-01-18 20:16:46 -0800, Andres Freund wrote:
> I noticed a few other sources of "unnecessary" fsyncs.  The most frequent
> being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests 
> are
> surprisingly large, 135k for a freshly initdb'd cluster.

Robert, I assume the fsync for manifests isn't ignoring --no-sync for a
particular reason?

The attached patch adds no-sync handling to the manifest rename, as well as
one case in the directory wal method.


It's a bit painful that we have to have code like

if (dir_data->sync)
r = durable_rename(tmppath, tmppath2);
else
{
if (rename(tmppath, tmppath2) != 0)
{
pg_log_error("could not rename file 
\"%s\" to \"%s\": %m",
 tmppath, 
tmppath2);
r = -1;
}
}

It seems like it'd be better to set it up so that durable_rename() could
decide internally wether to fsync, or have a wrapper around durable_rename()?


> There's an fsync in walmethods.c:tar_close() that sounds intentional, but I
> don't really understand what the comment:
> 
>   /* Always fsync on close, so the padding gets fsynced */
>   if (tar_sync(f) < 0)

tar_sync() actually checks for tar_data->sync, so it doesn't do an
fsync. Arguably the comment is a bit confusing, but ...

Greetings,

Andres Freund
>From 36772524734f9f613edd6622d840e48f045f04d8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 21 Jan 2022 10:30:37 -0800
Subject: [PATCH v1] pg_basebackup: Skip a few more fsyncs if --no-sync is
 specified.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/20220119041646.rhuo3youiqxqj...@alap3.anarazel.de
Backpatch:
---
 src/bin/pg_basebackup/pg_basebackup.c | 18 +++---
 src/bin/pg_basebackup/walmethods.c| 12 +++-
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d5b0ade10d5..a5cca388845 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -2196,9 +2196,21 @@ BaseBackup(void)
 		snprintf(tmp_filename, MAXPGPATH, "%s/backup_manifest.tmp", basedir);
 		snprintf(filename, MAXPGPATH, "%s/backup_manifest", basedir);
 
-		/* durable_rename emits its own log message in case of failure */
-		if (durable_rename(tmp_filename, filename) != 0)
-			exit(1);
+		if (do_sync)
+		{
+			/* durable_rename emits its own log message in case of failure */
+			if (durable_rename(tmp_filename, filename) != 0)
+exit(1);
+		}
+		else
+		{
+			if (rename(tmp_filename, filename) != 0)
+			{
+pg_log_error("could not rename file \"%s\" to \"%s\": %m",
+			 tmp_filename, filename);
+exit(1);
+			}
+		}
 	}
 
 	if (verbose)
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index f74bd13315c..a6d08c1270a 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -445,7 +445,17 @@ dir_close(Walfile f, WalCloseMethod method)
 			snprintf(tmppath2, sizeof(tmppath2), "%s/%s",
 	 dir_data->basedir, filename2);
 			pg_free(filename2);
-			r = durable_rename(tmppath, tmppath2);
+			if (dir_data->sync)
+r = durable_rename(tmppath, tmppath2);
+			else
+			{
+if (rename(tmppath, tmppath2) != 0)
+{
+	pg_log_error("could not rename file \"%s\" to \"%s\": %m",
+ tmppath, tmppath2);
+	r = -1;
+}
+			}
 		}
 		else if (method == CLOSE_UNLINK)
 		{
-- 
2.34.0



Re: Adding CI to our tree

2022-01-19 Thread Tom Lane
Andres Freund  writes:
> I'm doubtful that it's good that we use BAIL_OUT so liberally, because it
> prevents further tests from running (i.e. if 001 bails, 002+ doesn't run),
> which doesn't generally seem like the right thing to do after a single test
> fails. But that's really independent of the fix you make here.

Agreed, that's a separate question.  It does seem like "stop this script
and move to the next one" would be better behavior, though.

> I'd maybe do s/later/in the END block/ or such, so that one knows where to
> look. Took me a minute to find it again.

OK.

regards, tom lane




Re: Adding CI to our tree

2022-01-19 Thread Andres Freund
Hi,

On 2022-01-19 15:05:44 -0500, Tom Lane wrote:
> And the cause of that is obvious: Cluster::start thinks that if "pg_ctl
> start" failed, there couldn't possibly be a postmaster running.  So it
> doesn't bother to update self->_pid, and then the END hook thinks there is
> nothing to do.

Ah, right.

I'm doubtful that it's good that we use BAIL_OUT so liberally, because it
prevents further tests from running (i.e. if 001 bails, 002+ doesn't run),
which doesn't generally seem like the right thing to do after a single test
fails. But that's really independent of the fix you make here.


> The attached simple fix gets rid of this problem.  Any objections?

Nope, sounds like a plan.

> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
> b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index 7af0f8db13..fd0738d12d 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -845,6 +845,11 @@ sub start
>   {
>   print "# pg_ctl start failed; logfile:\n";
>   print PostgreSQL::Test::Utils::slurp_file($self->logfile);
> +
> + # pg_ctl could have timed out, so check to see if there's a pid 
> file;
> + # without this, we fail to shut down the new postmaster later.
> + $self->_update_pid(-1);

I'd maybe do s/later/in the END block/ or such, so that one knows where to
look. Took me a minute to find it again.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-19 Thread Tom Lane
I wrote:
> This test attempt revealed another problem too: the standby never
> shut down, and thus the calling "make" never quit, until I intervened
> manually.  I'm not sure why.  I see that Cluster::promote uses
> system_or_bail() to run "pg_ctl promote" ... could it be that
> BAIL_OUT causes the normal script END hooks to not get run?
> But it seems like we'd have noticed that long ago.

I failed to reproduce any failure in the promote step, and I now
think I was mistaken and it happened during the standby's initial
start.  I can reproduce that very easily by setting PGCTLTIMEOUT
to a second or two; with fsync enabled, it takes the standby more
than that to reach a consistent state.  And the cause of that
is obvious: Cluster::start thinks that if "pg_ctl start" failed,
there couldn't possibly be a postmaster running.  So it doesn't
bother to update self->_pid, and then the END hook thinks there
is nothing to do.

Now, leaving an idle postmaster hanging around isn't a mortal sin,
since it'll go away by itself shortly after the next cycle of
testing does an "rm -rf" on its data directory.  But it's ugly,
and conceivably it could cause problems for later testing on
machines with limited shmem or semaphore space.

The attached simple fix gets rid of this problem.  Any objections?

regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db13..fd0738d12d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -845,6 +845,11 @@ sub start
 	{
 		print "# pg_ctl start failed; logfile:\n";
 		print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+
+		# pg_ctl could have timed out, so check to see if there's a pid file;
+		# without this, we fail to shut down the new postmaster later.
+		$self->_update_pid(-1);
+
 		BAIL_OUT("pg_ctl start failed") unless $params{fail_ok};
 		return 0;
 	}
@@ -1123,7 +1128,10 @@ archive_command = '$copy_command'
 	return;
 }
 
-# Internal method
+# Internal method to update $self->{_pid}
+# $is_running = 1: pid file should be there
+# $is_running = 0: pid file should NOT be there
+# $is_running = -1: we aren't sure
 sub _update_pid
 {
 	my ($self, $is_running) = @_;
@@ -1138,7 +1146,7 @@ sub _update_pid
 		close $pidfile;
 
 		# If we found a pidfile when there shouldn't be one, complain.
-		BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running;
+		BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0;
 		return;
 	}
 
@@ -1146,7 +1154,7 @@ sub _update_pid
 	print "# No postmaster PID for node \"$name\"\n";
 
 	# Complain if we expected to find a pidfile.
-	BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running;
+	BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running == 1;
 	return;
 }
 


Re: Adding CI to our tree

2022-01-18 Thread Tom Lane
Andres Freund  writes:
> On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
>> This actually causes parallel check-world to fail altogether on florican's
>> host, because the initial fsync of the recovered primary takes more than 3
>> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.

> Ugh.

I misspoke there: it's the standby that is performing an fsync'd
checkpoint and timing out, during the test's promote-the-standby
step.

This test attempt revealed another problem too: the standby never
shut down, and thus the calling "make" never quit, until I intervened
manually.  I'm not sure why.  I see that Cluster::promote uses
system_or_bail() to run "pg_ctl promote" ... could it be that
BAIL_OUT causes the normal script END hooks to not get run?
But it seems like we'd have noticed that long ago.

regards, tom lane




Re: Adding CI to our tree

2022-01-18 Thread Tom Lane
Andres Freund  writes:
> On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
>> There is no reason for this script to be overriding Cluster.pm's
>> fsync = off setting.
>> This appears to go back to 917dc7d23 of 2016, so I think it just
>> predates our recognition that we should disable fsync in routine
>> tests.

> Yea, I noticed this too. I was wondering if there's a conceivable reason to
> actually want fsyncs, but I couldn't come up with one.

On the one hand, it feels a little wrong if our test suites never
reach our fsync calls at all.  On the other hand, it's not clear
what is meaningful about testing fsync when your test scenario
doesn't include a plug pull.

I'd be okay with having some exercise of the fsync code paths in
a test that is (a) designated for the purpose and (b) arranged
to not take an excessive amount of time, even under heavy load.
008_fsm_truncation.pl is neither of those things.  It seems
entirely random that it has fsync = on when we don't test that
elsewhere.

regards, tom lane




Re: Adding CI to our tree

2022-01-18 Thread Andres Freund
Hi,

On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
> I just found one thing making check-world slower than it ought to be:
> src/test/recovery/t/008_fsm_truncation.pl does
> 
> $node_primary->append_conf(
>   'postgresql.conf', qq{
> fsync = on
> wal_log_hints = on
> max_prepared_transactions = 5
> autovacuum = off
> });
> 
> There is no reason for this script to be overriding Cluster.pm's
> fsync = off setting.
> 
> This appears to go back to 917dc7d23 of 2016, so I think it just
> predates our recognition that we should disable fsync in routine
> tests.

Yea, I noticed this too. I was wondering if there's a conceivable reason to
actually want fsyncs, but I couldn't come up with one.

On systems where IO isn't overloaded, the main problem with this test are
elsewhere: It multiple times waits for VACUUMs that are blocked truncating the
table. Which these days takes 5 seconds. Thus the test takes quite a while.

To me VACUUM_TRUNCATE_LOCK_TIMEOUT = 5s seems awfully long. On a system with a
lot of tables that's much more than vacuum will take. So this can easily lead
to using up all autovacuum workers...



> This actually causes parallel check-world to fail altogether on florican's
> host, because the initial fsync of the recovered primary takes more than 3
> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.

Ugh.

I noticed a few other sources of "unnecessary" fsyncs.  The most frequent
being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are
surprisingly large, 135k for a freshly initdb'd cluster.


There's an fsync in walmethods.c:tar_close() that sounds intentional, but I
don't really understand what the comment:

/* Always fsync on close, so the padding gets fsynced */
if (tar_sync(f) < 0)

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-18 Thread Tom Lane
I just found one thing making check-world slower than it ought to be:
src/test/recovery/t/008_fsm_truncation.pl does

$node_primary->append_conf(
'postgresql.conf', qq{
fsync = on
wal_log_hints = on
max_prepared_transactions = 5
autovacuum = off
});

There is no reason for this script to be overriding Cluster.pm's
fsync = off setting.  This actually causes parallel check-world to
fail altogether on florican's host, because the initial fsync of
the recovered primary takes more than 3 minutes when there's
conflicting I/O traffic, causing pg_ctl to time out.

This appears to go back to 917dc7d23 of 2016, so I think it just
predates our recognition that we should disable fsync in routine
tests.

regards, tom lane




Re: Adding CI to our tree

2022-01-18 Thread Andres Freund
Hi,

On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote:
> On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > I think it might still be worth adding stopgap way of running all tap tests 
> > on
> > windows though. Having a vcregress.pl function to find all directories with 
> > t/
> > and run the tests there, shouldn't be a lot of code...
> 
> I started doing that, however it makes CI/windows even slower.

To be expected... Perhaps the caching approach I just posted in [1] would buy
most of it back though...


> I think it'll be necessary to run prove with all the tap tests to
> parallelize them, rather than looping around directories, many of which have
> only a single file, and are run serially.

That's unfortunately not trivially possible. Quite a few tests currently rely
on being called in a specific directory. We should fix this, but it's not a
trivial amount of work.

Greetings,

Andres Freund

[1] https://postgr.es/m/20220119010034.javla5sakeh2a4fa%40alap3.anarazel.de




Re: Adding CI to our tree

2022-01-18 Thread Justin Pryzby
On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> I think it might still be worth adding stopgap way of running all tap tests on
> windows though. Having a vcregress.pl function to find all directories with t/
> and run the tests there, shouldn't be a lot of code...

I started doing that, however it makes CI/windows even slower.  I think it'll
be necessary to run prove with all the tap tests to parallelize them, rather
than looping around directories, many of which have only a single file, and are
run serially.

https://github.com/justinpryzby/postgres/commits/citest-cirrus
This has the link to a recent cirrus result if you'd want to look.
I suppose I should start a new thread.  

There's a bunch of other stuff for cirrus in there (and bits and pieces of
other branches).

 .  cirrus: spell ccache_maxsize
 .  cirrus: avoid unnecessary double star **
 .  vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
 .  vcregress: style
 .  wip: vcsregress: add alltaptests
 .  wip: run upgrade tests with data-checksums
 .  pg_regress --initdb-opts
 .  wip: pg_upgrade: show list of files copied only in vebose mode
 .  wip: cirrus: include hints how to install OS packages..
 .  wip: cirrus: code coverage
 .  cirrus: upload html docs as artifacts
 .  wip: cirrus/windows: save compiled build as an artifact




Re: Adding CI to our tree

2022-01-18 Thread Andrew Dunstan


On 1/18/22 12:44, Andres Freund wrote:
> Hi,
>
> On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote:
>> Sure, very doable, although we will still need some special logic for
>> src/test - there are security implication from running the ssl, ldap and
>> kerberos tests by default. See its Makefile.
> ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd
> not need to duplicate them in the make / msvc world and we'd see them as
> skipped tests when not enabled.
>

Yeah, good idea. Especially if we can backpatch that. The buildfarm
client would also get simpler, so it would be doubleplusgood.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Adding CI to our tree

2022-01-18 Thread Andres Freund
Hi,

On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote:
> Sure, very doable, although we will still need some special logic for
> src/test - there are security implication from running the ssl, ldap and
> kerberos tests by default. See its Makefile.

ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd
not need to duplicate them in the make / msvc world and we'd see them as
skipped tests when not enabled.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-18 Thread Andrew Dunstan


On 1/18/22 08:06, Peter Eisentraut wrote:
> On 17.01.22 22:13, Andrew Dunstan wrote:
>> Well, the logic we use for TAP tests is we run them for the following if
>> they have a t subdirectory, subject to configuration settings like
>> PG_TEST_EXTRA:
>>
>>   src/test/bin/*
>>   contrib/*
>>   src/test/modules/*
>>   src/test/*
>>
>> As long as any new TAP test gets place in such a location nothing new is
>> required in the buildfarm client.
>
> But if I wanted to add TAP tests to libpq, then I'd still be stuck. 
> Why not change the above list to "anywhere"?



Sure, very doable, although we will still need some special logic for
src/test - there are security implication from running the ssl, ldap and
kerberos tests by default. See its Makefile.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Adding CI to our tree

2022-01-18 Thread Peter Eisentraut

On 17.01.22 22:13, Andrew Dunstan wrote:

Well, the logic we use for TAP tests is we run them for the following if
they have a t subdirectory, subject to configuration settings like
PG_TEST_EXTRA:

  src/test/bin/*
  contrib/*
  src/test/modules/*
  src/test/*

As long as any new TAP test gets place in such a location nothing new is
required in the buildfarm client.


But if I wanted to add TAP tests to libpq, then I'd still be stuck.  Why 
not change the above list to "anywhere"?






Re: Adding CI to our tree

2022-01-17 Thread Andrew Dunstan


On 1/17/22 13:19, Andres Freund wrote:
> Hi,
>
> On 2022-01-17 10:25:12 -0500, Andrew Dunstan wrote:
>> The buildfarm is moving in the opposite direction, to disaggregate
>> steps.
> I'm a bit confused as to where you want changes to vcregress.pl
> going. Upthread you argued against adding more granular targets to
> vcregress. But this seems to be arguing against that?


OK, let me clear that up. Yes I argued upthread for a 'make check-world'
equivalent, because it's useful for developers on Windows. But I don't
really want to use it in the buildfarm client, where I prefer to run
fine-grained tests.


>
>
>> There are several reasons for that, including that it makes for
>> less log output that you need to churn through o find out what's gone
>> wrong in a particular case, and that it makes disabling certain test
>> sets via the buildfarm client's `skip-steps' feature possible.
> FWIW, to me this shouldn't require a lot of separate manual test
> invocations. And continuing to have lots of granular test invocations from the
> buildfarm client is *bad*, because it requires constantly syncing up the set
> of test targets.


Well, the logic we use for TAP tests is we run them for the following if
they have a t subdirectory, subject to configuration settings like
PG_TEST_EXTRA:

 src/test/bin/*

 contrib/*

 src/test/modules/*

 src/test/*


As long as any new TAP test gets place in such a location nothing new is
required in the buildfarm client.


>
> It also makes the buildfarm far slower than necessary, because it runs a lot
> of stuff serially that it could run in parallel. 


That's a legitimate concern. I have made some strides towards gross
parallelism in the buildfarm by providing for different branches to be
run in parallel. This has proven to be fairly successful (i.e. I have
not run into any complaints, and several of my own animals utilize it).
I can certainly look at doing something of the sort for an individual
branch run.


> This is particularly a
> problem for things like valgrind runs, where individual tests are quite slow -
> but just throwing more CPUs at it would help a lot.


When I ran a valgrind animal, `make check` was horribly slow, and it's
already parallelized. But it was on a VM and I forget how many CPUs the
VM had.


>
> We should set things up so that:
>
> a) The output of each test can easily be associated with the corresponding set
>of log files.
> b) The list of tests run can be determined generically by looking at the
>filesystems
> c) For each test run, it's easy to see whether it failed or succeeded
>
> As part of the meson stuff (which has its own test runner), I managed to get a
> bit towards this by changing the log output hierarchy so that each test gets
> its own directory for log files (regress_log_*, initdb.log, postmaster.log,
> regression.diffs, server log files). What's missing is a
> test.{failed,succeeded} marker or such, to make it easy to report the log
> files for just failed tasks.


One thing I have been working on is a way to split the log output of an
individual buildfarm step, hitherto just a text blob, , so that you can
easily navigate to say regress_log_0003-foo-step.log without having to
page through myriads of stuff. It's been on the back burner but I need
to prioritize it, maybe when the current CF is done.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Adding CI to our tree

2022-01-17 Thread Andres Freund
Hi,

On 2022-01-17 14:30:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I think it's not actually that hard, with something like I described in the
> > email upthread, with each tests going into a prescribed location, and the
> > on-disk status being inspectable in an automated way. check-world could 
> > invoke
> > a command to summarize the tests at the end in a .NOTPARALLEL, to make the
> > local case easier.
>
> That sounds a bit, um, make-centric.  At this point it seems to me
> we ought to be thinking about how it'd work under meson.

Some of this is a lot easier with meson. It has a builtin test runner, which
understands tap (thereby not requiring prove anymore). Those tests can be
listed (meson test --list).

Depending on the option this results in a list of all tests with just the
"topline" name of passing tests and error output from failing tests, or all
output all the time or ... At the end it prints a summary of test counts and a
list of failed tests.

Here's an example (the leading timestamps are from CI, not meson), on windows:
https://api.cirrus-ci.com/v1/task/6009638771490816/logs/check.log

The test naming isn't necessarily great, but that's my fault.

Running all the tests with meson takes a good bit less time than running most,
but far from all, tests using vcregress.pl:
https://cirrus-ci.com/build/4611852939296768



meson test makes it far easier to spot which tests failed, it's consistent
across operating systems, allows to skip individual tests, etc.

However: It doesn't address the log collection issue in itself. For that we'd
still need to collect them in a way that's easier to associate with individual
tests.

In the meson branch I made it so that each test (including individual tap
ones) has it's own log directory, which makes it easier to select all the logs
for a failing test etc.


> > This subthread is about the windows tests specifically, where it's even 
> > worse
> > - there's no way to run all tests.
>
> That's precisely because the windows build doesn't use make.
> We shouldn't be thinking about inventing two separate dead-end
> solutions to this problem.

Agreed. I think some improvements, e.g. around making the logs easier to
associate with an individual test, is orthogonal to the buildsystem issue.


I think it might still be worth adding stopgap way of running all tap tests on
windows though. Having a vcregress.pl function to find all directories with t/
and run the tests there, shouldn't be a lot of code...

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-17 Thread Tom Lane
Andres Freund  writes:
> I think it's not actually that hard, with something like I described in the
> email upthread, with each tests going into a prescribed location, and the
> on-disk status being inspectable in an automated way. check-world could invoke
> a command to summarize the tests at the end in a .NOTPARALLEL, to make the
> local case easier.

That sounds a bit, um, make-centric.  At this point it seems to me
we ought to be thinking about how it'd work under meson.

> This subthread is about the windows tests specifically, where it's even worse
> - there's no way to run all tests.

That's precisely because the windows build doesn't use make.
We shouldn't be thinking about inventing two separate dead-end
solutions to this problem.

regards, tom lane




Re: Adding CI to our tree

2022-01-17 Thread Andres Freund
Hi,

On 2022-01-17 13:50:04 -0500, Robert Haas wrote:
> On Mon, Jan 17, 2022 at 1:19 PM Andres Freund  wrote:
> > FWIW, to me this shouldn't require a lot of separate manual test
> > invocations. And continuing to have lots of granular test invocations from 
> > the
> > buildfarm client is *bad*, because it requires constantly syncing up the set
> > of test targets.
> 
> I have a lot of sympathy with Andrew here, actually. If you just do
> 'make check-world' and assume that will cover everything, you get one
> giant output file. That is not great at all.

I very much agree with check-world output being near unusable.


>  That's really hard to accomplish if you just run all the tests with one
> invocation - any technique you use to find the boundaries between one test's
> output and the next will prove to be unreliable.

I think it's not actually that hard, with something like I described in the
email upthread, with each tests going into a prescribed location, and the
on-disk status being inspectable in an automated way. check-world could invoke
a command to summarize the tests at the end in a .NOTPARALLEL, to make the
local case easier.

pg_regress/isolation style tests have the "summary" output in regression.out,
but we remove it on success.
Tap tests have their output in the regress_log_* files, however these are far
more verbose than the output from running the tests directly.

I wonder if we should keep regression.out and also keep a copy of the
"shorter" tap test output in a file?


> But having said that, I also agree that it sucks to have to keep
> updating the BF client every time we want to do any kind of
> test-related changes in the main source tree.

It also sucks locally. I *hate* having to dig through a long check-world
output to find the first failure.

This subthread is about the windows tests specifically, where it's even worse
- there's no way to run all tests. Nor a list of test targets that's even
halfway complete :/. One just has to know that one has to invoke a myriad of
vcregress.pl taptest path/to/directory

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-17 Thread Tom Lane
Robert Haas  writes:
> I have a lot of sympathy with Andrew here, actually. If you just do
> 'make check-world' and assume that will cover everything, you get one
> giant output file. That is not great at all.

Yeah.  I agree with Andrew that we want output that is more modular,
not less so.  But we do need to find a way to have less knowledge
hard-wired in the buildfarm client script.

> But having said that, I also agree that it sucks to have to keep
> updating the BF client every time we want to do any kind of
> test-related changes in the main source tree. One way around that
> would be to put a file in the main source tree that the build farm
> client can read to know what to do. Another would be to have the BF
> client download the latest list of steps from somewhere instead of
> having it in the source code, so that it can be updated without
> everyone needing to update their machine.

The obvious place for "somewhere" is "the main source tree", so I
doubt your second suggestion is better than your first.  But your
first does seem like a plausible way to proceed.

Another way to think of it, maybe, is to migrate chunks of the
buildfarm client script itself into the source tree.  I'd rather
that developers not need to become experts on the buildfarm client
to make adjustments to the test process --- but I suspect that
a simple script like "run make check in these directories" is
not going to be flexible enough for everything.

regards, tom lane




Re: Adding CI to our tree

2022-01-17 Thread Robert Haas
On Mon, Jan 17, 2022 at 1:19 PM Andres Freund  wrote:
> FWIW, to me this shouldn't require a lot of separate manual test
> invocations. And continuing to have lots of granular test invocations from the
> buildfarm client is *bad*, because it requires constantly syncing up the set
> of test targets.

I have a lot of sympathy with Andrew here, actually. If you just do
'make check-world' and assume that will cover everything, you get one
giant output file. That is not great at all. People who are looking
through buildfarm results do not want to have to look through giant
logfiles hunting for the failure; they want to look at the stuff
that's just directly relevant to the failure they saw. The current BF
is actually pretty bad at this. You can click on various things on a
buildfarm results page, but it's not very clear where those links are
taking you, and the pages at least in my browser (normally Chrome)
render so slowly as to make the whole thing almost unusable. I'd like
to have a thing where the buildfarm shows a list of tests in red or
green and I can click links next to each test to see the various logs
that test produced. That's really hard to accomplish if you just run
all the tests with one invocation - any technique you use to find the
boundaries between one test's output and the next will prove to be
unreliable.

But having said that, I also agree that it sucks to have to keep
updating the BF client every time we want to do any kind of
test-related changes in the main source tree. One way around that
would be to put a file in the main source tree that the build farm
client can read to know what to do. Another would be to have the BF
client download the latest list of steps from somewhere instead of
having it in the source code, so that it can be updated without
everyone needing to update their machine. There might well be other
approaches that are even better. But the "ask Andrew to adjust the BF
client and then have everybody install the new version" approach upon
which we have been relying up until now is not terribly scalable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Adding CI to our tree

2022-01-17 Thread Andres Freund
Hi,

On 2022-01-17 10:25:12 -0500, Andrew Dunstan wrote:
> The buildfarm is moving in the opposite direction, to disaggregate
> steps.

I'm a bit confused as to where you want changes to vcregress.pl
going. Upthread you argued against adding more granular targets to
vcregress. But this seems to be arguing against that?


> There are several reasons for that, including that it makes for
> less log output that you need to churn through o find out what's gone
> wrong in a particular case, and that it makes disabling certain test
> sets via the buildfarm client's `skip-steps' feature possible.

FWIW, to me this shouldn't require a lot of separate manual test
invocations. And continuing to have lots of granular test invocations from the
buildfarm client is *bad*, because it requires constantly syncing up the set
of test targets.

It also makes the buildfarm far slower than necessary, because it runs a lot
of stuff serially that it could run in parallel. This is particularly a
problem for things like valgrind runs, where individual tests are quite slow -
but just throwing more CPUs at it would help a lot.

We should set things up so that:

a) The output of each test can easily be associated with the corresponding set
   of log files.
b) The list of tests run can be determined generically by looking at the
   filesystems
c) For each test run, it's easy to see whether it failed or succeeded

As part of the meson stuff (which has its own test runner), I managed to get a
bit towards this by changing the log output hierarchy so that each test gets
its own directory for log files (regress_log_*, initdb.log, postmaster.log,
regression.diffs, server log files). What's missing is a
test.{failed,succeeded} marker or such, to make it easy to report the log
files for just failed tasks.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-17 Thread Andrew Dunstan


On 1/14/22 18:54, Justin Pryzby wrote:
> On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote:
>> Hi,
>>
>> On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
>>> I can probably adjust to whatever we decide to do. But I think we're
>>> really just tinkering at the edges here. What I think we really need is
>>> the moral equivalent of `make check-world` in one invocation of
>>> vcregress.pl.
>> I agree strongly that we need that. But I think a good chunk of Justin's
>> changes are actually required to get there?
>>
>> Specifically, unless we want lots of duplicated logic in vcregress.pl, we
>> need to make vcregress know how to run NO_INSTALLCHECK test. The option added
>> was just so the buildfarm doesn't start to run tests multiple times...
> The main reason I made the INSTALLCHECK runs conditional (they only run if a
> new option is specified) is because of these comments:
>
> | # Disabled because these tests require 
> "shared_preload_libraries=pg_stat_statements",
> | # which typical installcheck users do not have (e.g. buildfarm clients).
> | NO_INSTALLCHECK = 1
>
> Also, I saw that you saw that Thomas discovered/pointed out that a bunch of 
> TAP
> tests aren't being run by CI.   I think vcregress should have an "alltap"
> target that runs everything like glob("**/t/").  CI would use that instead of
> the existing ssl, auth, subscription, recovery, and bin targets.  The 
> buildfarm
> could switch to that after it's been published.
>
> https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de




The buildfarm is moving in the opposite direction, to disaggregate
steps. There are several reasons for that, including that it makes for
less log output that you need to churn through o find out what's gone
wrong in a particular case, and that it makes disabling certain test
sets via the buildfarm client's `skip-steps' feature possible.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Adding CI to our tree

2022-01-14 Thread Justin Pryzby
On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
> > I can probably adjust to whatever we decide to do. But I think we're
> > really just tinkering at the edges here. What I think we really need is
> > the moral equivalent of `make check-world` in one invocation of
> > vcregress.pl.
> 
> I agree strongly that we need that. But I think a good chunk of Justin's
> changes are actually required to get there?
> 
> Specifically, unless we want lots of duplicated logic in vcregress.pl, we
> need to make vcregress know how to run NO_INSTALLCHECK test. The option added
> was just so the buildfarm doesn't start to run tests multiple times...

The main reason I made the INSTALLCHECK runs conditional (they only run if a
new option is specified) is because of these comments:

| # Disabled because these tests require 
"shared_preload_libraries=pg_stat_statements",
| # which typical installcheck users do not have (e.g. buildfarm clients).
| NO_INSTALLCHECK = 1

Also, I saw that you saw that Thomas discovered/pointed out that a bunch of TAP
tests aren't being run by CI.   I think vcregress should have an "alltap"
target that runs everything like glob("**/t/").  CI would use that instead of
the existing ssl, auth, subscription, recovery, and bin targets.  The buildfarm
could switch to that after it's been published.

https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de

-- 
Justin




Re: Adding CI to our tree

2022-01-14 Thread Andres Freund
Hi,

On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
> I can probably adjust to whatever we decide to do. But I think we're
> really just tinkering at the edges here. What I think we really need is
> the moral equivalent of `make check-world` in one invocation of
> vcregress.pl.

I agree strongly that we need that. But I think a good chunk of Justin's
changes are actually required to get there?

Specifically, unless we want lots of duplicated logic in vcregress.pl, we
need to make vcregress know how to run NO_INSTALLCHECK test. The option added
was just so the buildfarm doesn't start to run tests multiple times...

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-13 Thread Andrew Dunstan


On 1/13/22 13:55, Andres Freund wrote:
>> It needs to avoid running on the buildfarm, right ?
> I guess so. It currently appears to have its own logic for finding contrib
> (and other) tap tests:
>
> foreach my $testdir (glob("$pgsql/contrib/*"))
> {
> next unless -d "$testdir/t";
> my $testname = basename($testdir);
> next unless step_wanted("contrib-$testname");
> print time_str(), "running contrib test $testname ...\n" if 
> $verbose;
> run_tap_test("$testdir", "contrib-$testname", undef);
> }
>
> but does run vcregress contribcheck, modulescheck.
>
>
> Andrew, do you see a better way than what Justin is proposing here?
>

I can probably adjust to whatever we decide to do. But I think we're
really just tinkering at the edges here. What I think we really need is
the moral equivalent of `make check-world` in one invocation of
vcregress.pl.

While we're on the subject of vcregress.pl, there's this recent
discussion, which is on my list of things to return to:



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Adding CI to our tree

2022-01-13 Thread Andres Freund
Hi,

On 2022-01-13 13:06:42 -0600, Justin Pryzby wrote:
> Question: are data-checksums tested at all ?  The only thing I can find is 
> that
> some buildfarm members *might* exercise it during installcheck.

There's some coverage via src/bin/pg_basebackup/t/010_pg_basebackup.pl and
src/bin/pg_checksums/t/002_actions.pl - but that's not a whole lot.

Might be worth converting one of the "additional" pg_regress runs to use
data-checksums? E.g. pg_upgrade's, or the one being introduced in the "replay"
test?
https://postgr.es/m/CA%2BhUKGK-%2Bmg6RWiDu0JudF6jWeL5%2BgPmi8EKUm1eAzmdbwiE_A%40mail.gmail.com


> From b67cd05895c8fa42e13e6743db36412a68292607 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 9 Jan 2022 22:54:32 -0600
> Subject: [PATCH 2/7] CI: run initdb with --no-sync for windows

Applied this already.



> From 885becd19f630a69ab1de44cefcdda21ca8146d6 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Tue, 11 Jan 2022 01:30:37 -0600
> Subject: [PATCH 4/7] cirrus/linux: script test.log..
> 
> For consistency, and because otherwise errors can be lost (?) or hard to find.

> -  make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}
> +  script --command "make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}" 
> test.log
>  EOF

I'm not following this one? all the output is in the CI run already, you can
download it already as well?

The only reason to use script as a wrapper is that otherwise make on
freebsd/macos warns about fcntl failures?


>only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
> $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
> @@ -183,7 +178,7 @@ task:
>  mkdir -p ${CCACHE_DIR}
>  chown -R postgres:postgres ${CCACHE_DIR}
>  echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
> -su postgres -c "ulimit -l -H && ulimit -l -S"
> +su postgres -c "ulimit -l -H && ulimit -l -S" # XXX

Hm?


Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-13 Thread Justin Pryzby
On Thu, Jan 13, 2022 at 10:55:27AM -0800, Andres Freund wrote:
> I'll probably apply that part and 0002 separately.

I've hacked on it a bit more now..

Question: are data-checksums tested at all ?  The only thing I can find is that
some buildfarm members *might* exercise it during installcheck.

I added pg_regress --initdb-opts since that seems to be a deficiency.

-- 
Justin
>From 10121f588d0b3cab67ee810a718511879d6f24a8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 9 Jan 2022 18:25:02 -0600
Subject: [PATCH 1/7] vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1

---
 .cirrus.yml|  4 +-
 contrib/pg_stat_statements/Makefile|  2 +-
 contrib/test_decoding/Makefile |  2 +-
 src/test/modules/snapshot_too_old/Makefile |  2 +-
 src/test/modules/worker_spi/Makefile   |  2 +-
 src/tools/msvc/vcregress.pl| 46 +++---
 6 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..02ea7e67189 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -398,9 +398,9 @@ task:
   test_isolation_script:
 - perl src/tools/msvc/vcregress.pl isolationcheck
   test_modules_script:
-- perl src/tools/msvc/vcregress.pl modulescheck
+- perl src/tools/msvc/vcregress.pl modulescheck install
   test_contrib_script:
-- perl src/tools/msvc/vcregress.pl contribcheck
+- perl src/tools/msvc/vcregress.pl contribcheck install
   stop_script:
 - tmp_install\bin\pg_ctl.exe stop -D tmp_check/db -l tmp_check/postmaster.log
   test_ssl_script:
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38d..d732e1ade73 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -15,7 +15,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = pg_stat_statements oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b8795..14fd847ba7f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
 	twophase_snapshot
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 
 # Disabled because these tests require "wal_level=logical", which
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63c..752a0039fdc 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -5,7 +5,7 @@
 EXTRA_CLEAN = $(pg_regress_clean_files)
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
-ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
+ISOLATION_OPTS = --temp-config=$(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
 # Disabled because these tests require "old_snapshot_threshold" >= 0, which
 # typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/src/test/modules/worker_spi/Makefile b/src/test/modules/worker_spi/Makefile
index cbf9b2e37fd..d9f7d9bab6d 100644
--- a/src/test/modules/worker_spi/Makefile
+++ b/src/test/modules/worker_spi/Makefile
@@ -9,7 +9,7 @@ PGFILEDESC = "worker_spi - background worker example"
 REGRESS = worker_spi
 
 # enable our module in shared_preload_libraries for dynamic bgworkers
-REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/worker_spi/dynamic.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/worker_spi/dynamic.conf
 
 # Disable installcheck to ensure we cover dynamic bgworkers.
 NO_INSTALLCHECK = 1
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 7575acdfdf5..b24ea11aa4a 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -443,6 +443,7 @@ sub plcheck
 sub subdircheck
 {
 	my $module = shift;
+	my $installcheck = shift || 1;
 
 	if (   !-d "$module/sql"
 		|| !-d "$module/expected"
@@ -452,7 +453,7 @@ sub subdircheck
 	}
 
 	chdir $module;
-	my @tests = fetchTests();
+	my @tests = fetchTests($installcheck);
 
 	# Leave if no tests are listed in the module.
 	if (scalar @tests == 0)
@@ -462,6 +463,7 @@ sub subdircheck
 	}
 
 	my @opts = fetchRegressOpts();
+	push @opts, "--temp-instance=tmp_check" 

Re: Adding CI to our tree

2022-01-13 Thread Andres Freund
Hi,

On 2022-01-10 16:07:48 -0600, Justin Pryzby wrote:
> On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote:
> > On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> > > diff --git a/contrib/test_decoding/Makefile 
> > > b/contrib/test_decoding/Makefile
> > > index 9a31e0b8795..14fd847ba7f 100644
> > > --- a/contrib/test_decoding/Makefile
> > > +++ b/contrib/test_decoding/Makefile
> > > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup 
> > > concurrent_ddl_dml \
> > >   oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
> > >   twophase_snapshot
> > >
> > > -REGRESS_OPTS = --temp-config 
> > > $(top_srcdir)/contrib/test_decoding/logical.conf
> > > +REGRESS_OPTS = 
> > > --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
> > >  ISOLATION_OPTS = --temp-config 
> > > $(top_srcdir)/contrib/test_decoding/logical.conf
> >
> > Not sure why these are part of the diff?
>
> Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...]
> ..which means test1 gets eaten as the argument to --temp-config

Ah. I see you changed that globally, good...

I'll probably apply that part and 0002 separately.


> > This doesn't really seem like a scalable path forward - duplicating
> > configuration in more places doesn't seem sane. It seems it'd make more 
> > sense
> > to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
> > changing the options passed to pg_regress based on fetchTests() return value
> > wouldn't be too hard?
>
> It needs to run the tests with separate instance.  Maybe you're suggesting to
> use --temp-instance.

Yes.


> It needs to avoid running on the buildfarm, right ?

I guess so. It currently appears to have its own logic for finding contrib
(and other) tap tests:

foreach my $testdir (glob("$pgsql/contrib/*"))
{
next unless -d "$testdir/t";
my $testname = basename($testdir);
next unless step_wanted("contrib-$testname");
print time_str(), "running contrib test $testname ...\n" if 
$verbose;
run_tap_test("$testdir", "contrib-$testname", undef);
}

but does run vcregress contribcheck, modulescheck.


Andrew, do you see a better way than what Justin is proposing here?

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-13 Thread Andres Freund
Hi,

On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> I plan to push this after another cycle of tests passing (and driving
> over the bay bridge...) unless I hear protests.

I noticed that it's harder to see compile warnings on msvc in CI than it was
at an earlier point.  There used to be a summary of errors at the end.

That turns out to be an uninteded consequence of the option to reduce msbuild
verbosity.


> +# Use parallelism, disable file tracker, we're never going to rebuild...
> +MSBFLAGS: -m -verbosity:minimal /p:TrackFileAccess=false

Unless somebody protests quickly, I'm going to add
  "-consoleLoggerParameters:Summary;ForceNoAlign"
to that.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-10 Thread Justin Pryzby
On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote:
> On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
> > index 9a31e0b8795..14fd847ba7f 100644
> > --- a/contrib/test_decoding/Makefile
> > +++ b/contrib/test_decoding/Makefile
> > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup 
> > concurrent_ddl_dml \
> > oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
> > twophase_snapshot
> >  
> > -REGRESS_OPTS = --temp-config 
> > $(top_srcdir)/contrib/test_decoding/logical.conf
> > +REGRESS_OPTS = 
> > --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
> >  ISOLATION_OPTS = --temp-config 
> > $(top_srcdir)/contrib/test_decoding/logical.conf
> 
> Not sure why these are part of the diff?

Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...]
..which means test1 gets eaten as the argument to --temp-config

> > diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
> > index d8faa9c26c1..52cdb697a57 100644
> > --- a/src/tools/ci/pg_ci_base.conf
> > +++ b/src/tools/ci/pg_ci_base.conf
> > @@ -12,3 +12,24 @@ log_connections = true
> >  log_disconnections = true
> >  log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
> >  log_lock_waits = true
> > +
> > +# test_decoding
> > +wal_level = logical
> > +max_replication_slots = 4
> > +logical_decoding_work_mem = 64kB
> > [ more ]
> 
> This doesn't really seem like a scalable path forward - duplicating
> configuration in more places doesn't seem sane. It seems it'd make more sense
> to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
> changing the options passed to pg_regress based on fetchTests() return value
> wouldn't be too hard?

It needs to run the tests with separate instance.  Maybe you're suggesting to
use --temp-instance.

It needs to avoid running on the buildfarm, right ?

-- 
Justin
>From 0818f79b27de42182e26dd9dad991de8258c8238 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 9 Jan 2022 18:25:02 -0600
Subject: [PATCH 1/3] vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1

---
 .cirrus.yml|  4 +-
 contrib/pg_stat_statements/Makefile|  2 +-
 contrib/test_decoding/Makefile |  2 +-
 src/test/modules/snapshot_too_old/Makefile |  2 +-
 src/test/modules/worker_spi/Makefile   |  2 +-
 src/tools/msvc/vcregress.pl| 46 +++---
 6 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..02ea7e67189 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -398,9 +398,9 @@ task:
   test_isolation_script:
 - perl src/tools/msvc/vcregress.pl isolationcheck
   test_modules_script:
-- perl src/tools/msvc/vcregress.pl modulescheck
+- perl src/tools/msvc/vcregress.pl modulescheck install
   test_contrib_script:
-- perl src/tools/msvc/vcregress.pl contribcheck
+- perl src/tools/msvc/vcregress.pl contribcheck install
   stop_script:
 - tmp_install\bin\pg_ctl.exe stop -D tmp_check/db -l tmp_check/postmaster.log
   test_ssl_script:
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38d..d732e1ade73 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -15,7 +15,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = pg_stat_statements oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b8795..14fd847ba7f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
 	twophase_snapshot
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 
 # Disabled because these tests require "wal_level=logical", which
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63c..752a0039fdc 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -5,7 +5,7 @@
 EXTRA_CLEAN = $(pg_regress_clean_files)
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
-ISOLATION_OPTS = --temp-config 

Re: Adding CI to our tree

2022-01-09 Thread Andres Freund
Hi,

On 2021-10-02 12:59:09 -0700, Andres Freund wrote:
> On 2021-10-02 11:05:20 -0400, Tom Lane wrote:
> > I don't know enough about Windows to evaluate 0001, but I'm a little
> > worried about it because it looks like it's changing our *production*
> > error handling on that platform.
> 
> Yea. It's clearly not ready as-is - it's the piece that I was planning to
> write a separate email about.

> 
> It's hard to understand what *precisely* SEM_NOGPFAULTERRORBOX etc do.
> 
> What I do know is that without the _set_abort_behavior() stuff abort() doesn't
> trigger windows' "crash" paths in at least debugging builds, and that the
> SetErrorMode() and _CrtSetReportMode() changes are necessary to get segfaults
> to reach the crash paths.
> 
> The in-tree behaviour turns out to make debugging on windows a major pain, at
> least when compiling with msvc. Crashes never trigger core dumps or "just in
> time" debugging (their term for invoking a debugger upon crash), so one has to
> attach to processes before they crash, to have any chance of debugging.
> 
> As far as I can tell this also means that at least for debugging builds,
> pgwin32_install_crashdump_handler() is pretty much dead weight -
> crashDumpHandler() never gets invoked. I think it may get invoked for abort()s
> in production builds, but probably not for segfaults.
> 
> And despite SEM_NOGPFAULTERRORBOX we display those annoying "popup" boxes
> telling us about the crash and giving the option to retry, ignore, something
> something.   It's all a bit baffling.

FWIW, the latest version of this patch (including an explanation why
SEM_NOGPFAULTERRORBOX isn't useful for our purposes [anymore]) is at (and
above)
https://postgr.es/m/20220110005704.es4el6i2nxlxzwof%40alap3.anarazel.de

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-09 Thread Andres Freund
Hi,

On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> I noticed a patch failing in cfbot everywhere except windows:
> 
> https://commitfest.postgresql.org/36/3476/
> | Invalid relcache when ADD PRIMARY KEY USING INDEX
> 
> It's because vcregress skips tests which have NO_INSTALLCHECK=1.

> Is it desirable to enable more module/contrib tests for windows CI ?

Yes. I think the way we run windows tests is pretty bad - it's not reasonable
that each developer needs to figure out 20 magic incantations to run all tests
on windows.


> This does a few, but there's a few others which would require the server to
> be restarted to set shared_preload_libraries for each module.
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 19b3737fa11..c427b468334 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -390,7 +390,7 @@ task:
>  - perl src/tools/msvc/vcregress.pl check parallel
>startcreate_script:
>  # paths to binaries need backslashes
> -- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l 
> tmp_check/initdb.log
> +- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l 
> tmp_check/initdb.log --options=--no-sync
>  - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf
>  - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l 
> tmp_check/postmaster.log
>test_pl_script:

> diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
> index 9a31e0b8795..14fd847ba7f 100644
> --- a/contrib/test_decoding/Makefile
> +++ b/contrib/test_decoding/Makefile
> @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup 
> concurrent_ddl_dml \
>   oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
>   twophase_snapshot
>  
> -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
> +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
>  ISOLATION_OPTS = --temp-config 
> $(top_srcdir)/contrib/test_decoding/logical.conf
>

Not sure why these are part of the diff?


> diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
> index d8faa9c26c1..52cdb697a57 100644
> --- a/src/tools/ci/pg_ci_base.conf
> +++ b/src/tools/ci/pg_ci_base.conf
> @@ -12,3 +12,24 @@ log_connections = true
>  log_disconnections = true
>  log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
>  log_lock_waits = true
> +
> +# test_decoding
> +wal_level = logical
> +max_replication_slots = 4
> +logical_decoding_work_mem = 64kB
> [ more ]

This doesn't really seem like a scalable path forward - duplicating
configuration in more places doesn't seem sane. It seems it'd make more sense
to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
changing the options passed to pg_regress based on fetchTests() return value
wouldn't be too hard?

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-01-09 Thread Justin Pryzby
I noticed a patch failing in cfbot everywhere except windows:

https://commitfest.postgresql.org/36/3476/
| Invalid relcache when ADD PRIMARY KEY USING INDEX

It's because vcregress skips tests which have NO_INSTALLCHECK=1.

Is it desirable to enable more module/contrib tests for windows CI ?

This does a few, but there's a few others which would require the server to
be restarted to set shared_preload_libraries for each module.

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..c427b468334 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -390,7 +390,7 @@ task:
 - perl src/tools/msvc/vcregress.pl check parallel
   startcreate_script:
 # paths to binaries need backslashes
-- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log
+- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l 
tmp_check/initdb.log --options=--no-sync
 - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf
 - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l 
tmp_check/postmaster.log
   test_pl_script:
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b8795..14fd847ba7f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup 
concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
twophase_snapshot
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 
 # Disabled because these tests require "wal_level=logical", which
diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
index d8faa9c26c1..52cdb697a57 100644
--- a/src/tools/ci/pg_ci_base.conf
+++ b/src/tools/ci/pg_ci_base.conf
@@ -12,3 +12,24 @@ log_connections = true
 log_disconnections = true
 log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
 log_lock_waits = true
+
+# test_decoding
+wal_level = logical
+max_replication_slots = 4
+logical_decoding_work_mem = 64kB
+
+# commit_ts
+track_commit_timestamp = on
+
+## worker_spi
+#shared_preload_libraries = worker_spi
+#worker_spi.database = contrib_regression
+
+## pg_stat_statements
+##shared_preload_libraries=pg_stat_statements
+
+## test_rls_hooks
+#shared_preload_libraries=test_rls_hooks
+
+## snapshot_too_old
+#old_snapshot_threshold = 60min
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 8f3e3fa937b..7e2cc971a42 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -443,6 +443,7 @@ sub plcheck
 sub subdircheck
 {
my $module = shift;
+   my $obey_installcheck = shift || 1;
 
if (   !-d "$module/sql"
|| !-d "$module/expected"
@@ -452,7 +453,7 @@ sub subdircheck
}
 
chdir $module;
-   my @tests = fetchTests();
+   my @tests = fetchTests($obey_installcheck);
 
# Leave if no tests are listed in the module.
if (scalar @tests == 0)
@@ -516,6 +517,14 @@ sub contribcheck
my $status = $? >> 8;
$mstat ||= $status;
}
+
+   subdircheck('test_decoding', -1);
+   $mstat ||= $? >> 8;
+
+   # The DB would need to be restarted
+   #subdircheck('pg_stat_statements', -1);
+   #$mstat ||= $? >> 8;
+
exit $mstat if $mstat;
return;
 }
@@ -530,6 +539,19 @@ sub modulescheck
my $status = $? >> 8;
$mstat ||= $status;
}
+
+   subdircheck('commit_ts', -1);
+   $mstat ||= $? >> 8;
+
+   subdircheck('test_rls_hooks', -1);
+   $mstat ||= $? >> 8;
+
+   ## The DB would need to be restarted
+   #subdircheck('worker_spi', -1);
+   #$mstat ||= $? >> 8;
+
+   # src/test/modules/snapshot_too_old/Makefile
+
exit $mstat if $mstat;
return;
 }
@@ -726,6 +748,7 @@ sub fetchTests
my $m = <$handle>;
close($handle);
my $t = "";
+   my $obey_installcheck = shift || 1;
 
$m =~ s{\\\r?\n}{}g;
 
@@ -733,7 +756,7 @@ sub fetchTests
# so bypass its run by returning an empty set of tests.
if ($m =~ /^\s*NO_INSTALLCHECK\s*=\s*\S+/m)
{
-   return ();
+   return () if $obey_installcheck == 1;
}
 
if ($m =~ /^REGRESS\s*=\s*(.*)$/gm)




Re: Adding CI to our tree

2021-12-31 Thread Alvaro Herrera
On 2021-Dec-30, Andres Freund wrote:

> On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> > I plan to push this after another cycle of tests passing (and driving
> > over the bay bridge...) unless I hear protests.
> 
> Pushed.
> 
> Marked CF entry as committed.

I tried it using the column filter patch.  It worked on the first
attempt.

Thanks!

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Adding CI to our tree

2021-12-30 Thread Andres Freund
On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> I plan to push this after another cycle of tests passing (and driving
> over the bay bridge...) unless I hear protests.

Pushed.

Marked CF entry as committed.




Re: Adding CI to our tree

2021-12-30 Thread Andres Freund
Hi,

On 2021-12-30 20:28:46 -0600, Justin Pryzby wrote:
> [ language fixes]

Thanks!

> script uses a pseudo-tty, which do support locking.
> => which does

This didn't seem right either way - it's pseudo-ttys that don't support
locking, so plural seemed appropriate? I changed it to "script uses
pseudo-ttys, which do" instead...

Regards,

Andres




Re: Adding CI to our tree

2021-12-30 Thread Justin Pryzby
commit message: agithub

the the the buildfarm.
=> the

access too.
=> to

# Due to that it using concurrency within prove is helpful.
=> Due to that, it's useful to run prove with multiple jobs.

further details src/tools/ci/README
=> further details , see src/tools/ci/README

script uses a pseudo-tty, which do support locking.
=> which does

To limit unneccessary work only run this once normal linux test succeeded
=> unnecessary
=> succeeds

-- 
Justin




Re: Adding CI to our tree

2021-12-30 Thread Andres Freund
Hi,

Attached is v5 of the CI patch. Not a lot of changes:
- a bunch of copy-editing, wrote a commit message etc
- use ccache for CXX/CLANG in the CompilerWarnings task, I had missed
  that when making the task use all --with-* flags
- switch linux to use ossp-uuid. I tried to switch macos at first, but
  that doesn't currently seem to work.
- minor other cleanups

This time I've only attached the main CI patch, not the one making core
dumps on windows work. That's not yet committable...

I plan to push this after another cycle of tests passing (and driving
over the bay bridge...) unless I hear protests.

Greetings,

Andres Freund
>From 124ec8dea062e95931d5d2f80633e025aa7733ad Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 15 Mar 2021 09:25:15 -0700
Subject: [PATCH v5] ci: Add continuous integration for github repositories via
 cirrus-ci.

Currently FreeBSD, Linux, macOS and Windows (Visual Studio) are tested.

The main goal of this integration is to make it easier to test in-development
patches across multiple platforms. This includes improving the testing done
automatically by cfbot [1] for commitfest entries.  It is *not* the goal to
supersede the buildfarm.

cirrus-ci [2] was chosen because it was already in use for cfbot, allows using
full VMs, has good OS coverage and allows accessing the full test results
without authentication (like agithub account).  It might be worth adding
support for further CI providers, particularly ones supporting other git
forges, in the future.

To keep CI times tolerable, most platforms use pre-generated images. Some
platforms use containers, others use full VMs.

For instructions on how to enable the CI integration in a repository and
further details src/tools/ci/README

[1] http://cfbot.cputube.org/
[2] https://cirrus-ci.org/

Author: Andres Freund 
Author: Thomas Munro 
Author: Melanie Plageman 
Reviewed-By: Melanie Plageman 
Reviewed-By: Justin Pryzby 
Reviewed-By: Thomas Munro 
Reviewed-By: Peter Eisentraut 
Discussion: https://postgr.es/m/20211001222752.wrz7erzh4cajv...@alap3.anarazel.de
---
 .cirrus.yml | 547 
 src/tools/ci/README |  63 +++
 src/tools/ci/cores_backtrace.sh |  50 +++
 src/tools/ci/gcp_freebsd_repartition.sh |  28 ++
 src/tools/ci/pg_ci_base.conf|  14 +
 src/tools/ci/windows_build_config.pl|  13 +
 6 files changed, 715 insertions(+)
 create mode 100644 .cirrus.yml
 create mode 100644 src/tools/ci/README
 create mode 100755 src/tools/ci/cores_backtrace.sh
 create mode 100755 src/tools/ci/gcp_freebsd_repartition.sh
 create mode 100644 src/tools/ci/pg_ci_base.conf
 create mode 100644 src/tools/ci/windows_build_config.pl

diff --git a/.cirrus.yml b/.cirrus.yml
new file mode 100644
index 000..f49799e9a32
--- /dev/null
+++ b/.cirrus.yml
@@ -0,0 +1,547 @@
+# CI configuration file for CI utilizing cirrus-ci.org
+#
+# For instructions on how to enable the CI integration in a repository and
+# further details src/tools/ci/README
+
+
+env:
+  # Source of images / containers
+  GCP_PROJECT: pg-ci-images
+  IMAGE_PROJECT: $GCP_PROJECT
+  CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
+
+  # The lower depth accelerates git clone. Use a bit of depth so that
+  # concurrent tasks and retrying older jobs has a chance of working.
+  CIRRUS_CLONE_DEPTH: 500
+  # Useful to be able to analyse what in a script takes long
+  CIRRUS_LOG_TIMESTAMP: true
+
+  CCACHE_MAXSIZE: "250M"
+
+  # target to test, for all but windows
+  CHECK: check-world PROVE_FLAGS=$PROVE_FLAGS
+  CHECKFLAGS: -Otarget
+  PROVE_FLAGS: --timer
+  PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
+  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
+  PG_TEST_EXTRA: kerberos ldap ssl
+
+
+# What files to preserve in case tests fail
+on_failure: _failure
+  log_artifacts:
+path: "**/**.log"
+type: text/plain
+  regress_diffs_artifacts:
+path: "**/**.diffs"
+type: text/plain
+  tap_artifacts:
+path: "**/regress_log_*"
+type: text/plain
+
+
+task:
+  name: FreeBSD - 13
+
+  env:
+# FreeBSD on GCP is slow when running with larger number of CPUS /
+# jobs. Using one more job than cpus seems to work best.
+CPUS: 2
+BUILD_JOBS: 3
+TEST_JOBS: 3
+
+CCACHE_DIR: /tmp/ccache_dir
+
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
+
+  compute_engine_instance:
+image_project: $IMAGE_PROJECT
+image: family/pg-ci-freebsd-13-0
+platform: freebsd
+cpu: $CPUS
+memory: 2G
+disk: 50
+
+  sysinfo_script: |
+id
+uname -a
+ulimit -a -H && ulimit -a -S
+export
+
+  ccache_cache:
+folder: $CCACHE_DIR
+  # Workaround around performance issues due to 32KB block size
+  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
+  create_user_script: |
+pw useradd postgres
+chown -R postgres:postgres .
+mkdir 

Re: Adding CI to our tree

2021-12-29 Thread Daniel Gustafsson
> On 29 Dec 2021, at 21:17, Andres Freund  wrote:
> On 2021-12-20 11:21:05 -0800, Andres Freund wrote:

>> Attached is v4 of the CI patch.
> 
> I'd like to push this - any objections? It's not disruptive to anything but
> cfbot, so we can incrementally improve it further.

No objection, I'm +1 on getting this in.

--
Daniel Gustafsson   https://vmware.com/





  1   2   >