Bug#414092: airport-utils: Tools start and quit immediately without working
I just did a quick check and the tools work for me on the current stable (11.6) and unstable. The only time they don't start is when X11 is not available like in an ssh session: $ java -verbose:class -jar /usr/share/java/airport-utils/AirportBaseStationConfig.jar ... [0.170s][info][class,load] sun.awt.MostRecentKeyValue source: jrt:/java.desktop [0.170s][info][class,load] sun.awt.PostEventQueue source: jrt:/java.desktop [0.171s][info][class,load] java.util.Vector source: jrt:/java.base [0.171s][info][class,load] java.awt.Window$Type source: jrt:/java.desktop [0.171s][info][class,load] java.lang.UnsupportedOperationException source: jrt:/java.base [0.171s][info][class,load] java.awt.HeadlessException source: jrt:/java.desktop [0.171s][info][class,load] java.util.IdentityHashMap$IdentityHashMapIterator source: shared objects file [0.171s][info][class,load] java.util.IdentityHashMap$KeyIterator source: shared objects file [0.171s][info][class,load] java.lang.Shutdown source: shared objects file [0.171s][info][class,load] java.lang.Shutdown$Lock source: shared objects file I suppose the startup scripts could somehow check if X11 is not available and print a warning? -- Valentin
Bug#1018930: [Debian-ha-maintainers] Bug#1018930: marked as done (pcs: CVE-2022-2735: Obtaining an authentication token for hacluster user leads to privilege escalation)
I checked pcs 0.10.1-2 in buster and it turns out it is not vulnerable to CVE-2022-2735. Separate ruby daemon with a world writable UNIX socket was introduced later in 0.10.5: https://salsa.debian.org/ha-team/pcs/-/commits/master/pcsd/pcsd-ruby.service.in Before that version python code runs ruby commands and they communicate by sending json responses on stdin/stdout. https://salsa.debian.org/ha-team/pcs/-/blob/38330deb0d849d6a1945856b24323043f6a7839b/pcs/daemon/ruby_pcsd.py -- Valentin
Bug#1008379: closing 1008379
close 1008379 1.5.1-1 thanks Tested the build of the new package release in sbuild sid chroot and did not see any problems. -- Valentin
Bug#994418: ocfs2-tools: failing autopkgtest on one of ci.d.n amd64 workers
Hi Paul, On Thu, Sep 16, 2021 at 08:34:06AM +0200, Paul Gevers wrote: > It was pointed out to me on IRC that the mount of /tmp with `nodev` is > probably the issue here. I'm discussion if we should just drop that. The failing test does not use a device so this probably won't help. I tried updating the test to use losetup, but it turns out losetup does not work with lxc. It seems that O_DIRECT on tmpfs is a know problem and other software like mysql also doesn't work on tmpfs. There were some kernel patches to allow O_DIRECT on tmpfs, but they were probably not accepted. Perhaps it be possible not to use tmpfs for $AUTOPKGTEST_TMP, or was that the goal in the first place? -- Valentin
Bug#994418: ocfs2-tools: failing autopkgtest on one of ci.d.n amd64 workers
On Wed, Sep 15, 2021 at 09:24:08PM +0200, Paul Gevers wrote: > I looked at the results of the autopkgtest of you package on amd64 > because with a recent upload of glibc the autopkgtest of ocfs2-tools > fails in testing. It seems to me that the failures are related to the > worker that the test runs on. ci-worker13 fails, while the other workers > are OK. We recently changed the setup of ci-worker13, to have /tmp/ of > the host on tmpfs as that speeds up testing considerably is a lot of > cases. I copied some of the output at the bottom of this report, but I'm > not 100% sure that the /tmp there (the one inside the lxc testbed) *is* > on tmpfs. > > Don't hesitate to contact us at debian...@lists.debian.org if you need > help debugging this issue. > > Paul > > https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocfs2-tools/15277216/log.gz > > > autopkgtest [19:14:22]: test basic: [--- > > === disk === > 200+0 records in > 200+0 records out > 209715200 bytes (210 MB, 200 MiB) copied, 0.109005 s, 1.9 GB/s > > === mkfs === > mkfs.ocfs2 1.8.6 > mkfs.ocfs2: Could not open device > /tmp/autopkgtest-lxc.8neywhcx/downtmp/autopkgtest_tmp/disk: Invalid argument > autopkgtest [19:14:23]: test basic: ---] Yes, tmpfs seems to be the problem since it doesn't support O_DIRECT that is being requested here: static void open_device(State *s) { s->fd = open64(s->device_name, O_RDWR | O_DIRECT); if (s->fd == -1) { com_err(s->progname, 0, "Could not open device %s: %s", s->device_name, strerror (errno)); exit(1); } } -- Valentin
Bug#987441: s390x installation bugs
On Sun, Aug 01, 2021 at 09:45:00PM +0200, Valentin Vidic wrote: > Thanks, that does sound similar to what I was getting there. I will try > to see if it still happens with the latest installer. And since it > crashes on start I had no way to access the logs or dmesg of the > machine. Perhaps there is some installer option to help debug this kind > of thing? Just tested the rc3 installation with qemu-system-s390x. Installation went fairly quickly and without any problems. Great work everyone and happy release :))) -- Valentin
Bug#987441: s390x installation bugs
On Sun, Aug 01, 2021 at 05:10:07PM +0200, Cyril Brulebois wrote: > Valentin Vidic (2021-08-01): > > No problem, I was not able to reproduce this reliably or get a core > > dump for this crash. It could just be an emulation problem with qemu > > or some timing issue for the first installer step. If there is no > > update on this problem I think we can even close it for now. > > Speaking of the first step, did anyone mention #987368 before, now fixed > in udpkg? Thanks, that does sound similar to what I was getting there. I will try to see if it still happens with the latest installer. And since it crashes on start I had no way to access the logs or dmesg of the machine. Perhaps there is some installer option to help debug this kind of thing? -- Valentin
Bug#987441: s390x installation bugs
On Mon, May 03, 2021 at 08:36:58AM +0200, Cyril Brulebois wrote: > Also adding to the list of bugs to keep an eye on (again, possibly not > blocking the release on its being resolved; we could have the issue > listed in errata, and possibly fixed in a point release). Thanks, here is another one for s390x, should be relatively simple if you wish to link it here: linux: Debian installation fails in qemu-system-s390x due to missing virtio_blk module https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988005 -- Valentin
Bug#987441: Bug#926539: Bug#987441: s390x installation bugs
On Mon, May 03, 2021 at 08:58:02AM +0200, John Paul Adrian Glaubitz wrote: > > The same issue exists on s390x but isn't apparently going to get fixed > > so we need to have d-i be smarter (hence the merge request)? > > Seems so. QEMU console might get fixed in the kernel, but it looks like LPAR could have a similar problem (don't have access to test this). So it seems better (and future proof) to fix this on the Debian side too. I have updated the merge request to trigger the new code only on s390x as suggested: https://salsa.debian.org/installer-team/rootskel/-/merge_requests/2 > > I'd suggest at least retitling the bug report to mention s390x (release > > arch, affected) instead of sparc64 (port arch, no longer affected), to > > lower the chances people could overlook this issue, thinking it's only > > about a port arch. > > We could also unmerge #926539 and #961056 again, then close the former bug > which was sparc64-specific. I have unmerged the bugs now, so the sparc one can be closed. -- Valentin
Bug#987441: s390x installation bugs
Hi, Probably not critical, but maybe these installation bugs on s390x could be fixed for the release? rootskel: steal-ctty no longer works on at least sparc64 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926539 debian-installer: qemu-system-s390x installation fails due to segfault in main-menu https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987788 -- Valentin
Bug#987351: claim bug
user debian-rele...@lists.debian.org usertags -1 + bsp-2021-04-AT-Salzburg thank you -- Valentin
Bug#975543: fence-agents autopkg tests time out
On Mon, Nov 23, 2020 at 11:46:54AM +0100, Matthias Klose wrote: > Package: src:fence-agents > Version: 4.6.0-2 > Severity: serious > Tags: sid bullseye > User: debian-pyt...@lists.debian.org > Usertags: python3.9 > > fence-agents autopkg tests time out, might not be Python 3.9 specific. Yup, this should be a dash wait hang tracked in #974705. -- Valentin
Bug#974705: fence-agents test hangs
One of autopkgtests in fence-agents package seems to be broken in the same way - just hangs in wait forever and using bash works: https://salsa.debian.org/ha-team/fence-agents/-/blob/master/debian/tests/delay -- Valentin
Bug#945881: bgpdump: Segmentation fault
Package: bgpdump Version: 1.6.0-1 Severity: grave Dear Maintainer, The program segfaults when started: $ bgpdump Segmentation fault Based on gdb info it seems like the call to log_to_stderr fails: (gdb) bt #0 0x2246 in ?? () #1 0x77fef5cf in main () (gdb) disas main Dump of assembler code for function main: 0x77fef5a0 <+0>: push %r15 0x77fef5a2 <+2>: xor%r15d,%r15d 0x77fef5a5 <+5>: push %r14 0x77fef5a7 <+7>: mov$0x1,%r14d 0x77fef5ad <+13>:push %r13 0x77fef5af <+15>:lea0xa055(%rip),%r13# 0x77ff960b 0x77fef5b6 <+22>:push %r12 0x77fef5b8 <+24>:mov%rsi,%r12 0x77fef5bb <+27>:push %rbp 0x77fef5bc <+28>:mov%edi,%ebp 0x77fef5be <+30>:push %rbx 0x77fef5bf <+31>:lea0xa7c2(%rip),%rbx# 0x77ff9d88 0x77fef5c6 <+38>:sub$0x18,%rsp => 0x77fef5ca <+42>:callq 0x77fef240 -- System Information: Debian Release: 10.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-6-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages bgpdump depends on: ii libbsd0 0.9.1-2 ii libbz2-1.0 1.0.6-9.2~deb10u1 ii libc6 2.28-10 ii zlib1g 1:1.2.11.dfsg-1 bgpdump recommends no packages. bgpdump suggests no packages. -- no debconf information
Bug#935562: ITP: google-auth-httplib2 -- Google Authentication Library: httplib2 transport
Package: wnpp Severity: wishlist Owner: Valentin Vidic * Package name: google-auth-httplib2 Version : 0.0.3 Upstream Author : Google Cloud Platform * URL : https://github.com/GoogleCloudPlatform/google-auth-library-python-httplib2 * License : Apache 2.0 Programming Lang: Python Description : Google Authentication Library: httplib2 transport Python library providing a httplib2 transport for google-auth. This library is intended to help existing users of oauth2client migrate to google-auth. The intent of this package is to be used together with the existing python3-googleapi package (see #935562). The package will be maintained by the DPMT on salsa.
Bug#934519: stretch-pu: package fence-agents/4.0.25-1+deb9u2
Package: release.debian.org Severity: normal Tags: stretch User: release.debian@packages.debian.org Usertags: pu Hi, Please allow an update for fence-agents package fixing occasional FTBFS reported in #934519. Patch for the change follows below. diff -Nru fence-agents-4.0.25/debian/changelog fence-agents-4.0.25/debian/changelog --- fence-agents-4.0.25/debian/changelog2019-06-30 19:01:55.0 +0200 +++ fence-agents-4.0.25/debian/changelog2019-09-29 12:27:01.0 +0200 @@ -1,3 +1,9 @@ +fence-agents (4.0.25-1+deb9u2) stretch; urgency=medium + + * Update patch for removing fence_amt_ws (Closes: #934519) + + -- Valentin Vidic Sun, 29 Sep 2019 12:27:01 +0200 + fence-agents (4.0.25-1+deb9u1) stretch; urgency=medium * fence_rhevm: add patch for CVE-2019-10153 (Closes: #930887) diff -Nru fence-agents-4.0.25/debian/patches/remove-fence_amt_ws fence-agents-4.0.25/debian/patches/remove-fence_amt_ws --- fence-agents-4.0.25/debian/patches/remove-fence_amt_ws 2019-06-30 19:01:55.0 +0200 +++ fence-agents-4.0.25/debian/patches/remove-fence_amt_ws 2019-09-29 12:27:01.0 +0200 @@ -1,16 +1,16 @@ --- a/configure.ac +++ b/configure.ac -@@ -142,6 +142,9 @@ if test "x$AGENTS_LIST" = xall; then - AGENTS_LIST=`find $srcdir/fence/agents -mindepth 2 -maxdepth 2 -name '*.py' -printf '%P ' | sed -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'` +@@ -139,7 +139,8 @@ + fi + + if test "x$AGENTS_LIST" = xall; then +- AGENTS_LIST=`find $srcdir/fence/agents -mindepth 2 -maxdepth 2 -name '*.py' -printf '%P ' | sed -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'` ++ # remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) in Debian ++ AGENTS_LIST=`find $srcdir/fence/agents -mindepth 2 -maxdepth 2 -name '*.py' ! -name 'fence_amt_ws.py' -printf '%P ' | sed -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'` fi -+# remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) in Debian -+AGENTS_LIST=$(echo $AGENTS_LIST | sed -e "s!amt_ws/fence_amt_ws.py !!") -+ XENAPILIB=0 - if echo "$AGENTS_LIST" | grep -q xenapi; then - XENAPILIB=1 -@@ -163,7 +166,8 @@ AC_PYTHON_MODULE(suds, 1) +@@ -163,7 +164,8 @@ AC_PYTHON_MODULE(pexpect, 1) AC_PYTHON_MODULE(pycurl, 1) AC_PYTHON_MODULE(requests, 1)
Bug#934519: buster-pu: package fence-agents/4.3.3-2+deb10u1
Package: release.debian.org Severity: normal Tags: buster User: release.debian@packages.debian.org Usertags: pu Hi, Please allow an update for fence-agents package fixing ocassional FTBFS reported in #934519. Patch for the change follows below. diff -Nru fence-agents-4.3.3/debian/changelog fence-agents-4.3.3/debian/changelog --- fence-agents-4.3.3/debian/changelog 2019-06-23 19:53:35.0 +0200 +++ fence-agents-4.3.3/debian/changelog 2019-09-29 11:54:16.0 +0200 @@ -1,3 +1,9 @@ +fence-agents (4.3.3-2+deb10u1) buster; urgency=medium + + * Update patch for removing fence_amt_ws (Closes: #934519) + + -- Valentin Vidic Sun, 29 Sep 2019 11:54:16 +0200 + fence-agents (4.3.3-2) unstable; urgency=high * fence_rhevm: add patch for CVE-2019-10153 (Closes: #930887) diff -Nru fence-agents-4.3.3/debian/patches/remove-fence_amt_ws fence-agents-4.3.3/debian/patches/remove-fence_amt_ws --- fence-agents-4.3.3/debian/patches/remove-fence_amt_ws 2018-10-06 22:30:46.0 +0200 +++ fence-agents-4.3.3/debian/patches/remove-fence_amt_ws 2019-09-29 11:52:14.0 +0200 @@ -6,13 +6,13 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/ --- a/configure.ac +++ b/configure.ac -@@ -176,6 +176,9 @@ - AGENTS_LIST=`find $srcdir/agents -mindepth 2 -maxdepth 2 -name 'fence_*.py' -print0 | xargs -0 | sed -e 's#[^ ]*/agents/##g' -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'` +@@ -175,7 +175,8 @@ + fi + + if test "x$AGENTS_LIST" = xall; then +- AGENTS_LIST=`find $srcdir/agents -mindepth 2 -maxdepth 2 -name 'fence_*.py' -print0 | xargs -0 | sed -e 's#[^ ]*/agents/##g' -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'` ++ # remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) in Debian ++ AGENTS_LIST=`find $srcdir/agents -mindepth 2 -maxdepth 2 -name 'fence_*.py' ! -name fence_amt_ws.py -print0 | xargs -0 | sed -e 's#[^ ]*/agents/##g' -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'` fi -+# remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) in Debian -+AGENTS_LIST=$(echo $AGENTS_LIST | sed -e "s!amt_ws/fence_amt_ws.py !!") -+ XENAPILIB=0 - if echo "$AGENTS_LIST" | grep -q xenapi; then - XENAPILIB=1
Bug#925354: [Debian-ha-maintainers] Bug#925354: pacemaker-dev: missing Breaks+Replaces: libcrmcluster1-dev
On Mon, Mar 25, 2019 at 03:45:58PM +0100, Andreas Beckmann wrote: > In that case you should probably add Breaks+Replaces against all of the > old -dev packages that were merged, just to be on the safe side. Yes, that is the plan. I think wferi will take care of it if he has time? -- Valentin
Bug#925354: [Debian-ha-maintainers] Bug#925354: pacemaker-dev: missing Breaks+Replaces: libcrmcluster1-dev
On Sat, Mar 23, 2019 at 05:19:59PM +0100, Andreas Beckmann wrote: > during a test with piuparts I noticed your package fails to upgrade from > 'wheezy' to 'jessie' to 'stretch' to 'buster'. > It installed fine in 'wheezy', and upgraded to 'jessie' and 'stretch' > successfully, > but then the upgrade to 'buster' failed. > > In case the package was not part of an intermediate stable release, > the version from the preceding stable release was kept installed. > > From the attached log (scroll to the bottom...): > > Selecting previously unselected package pacemaker-dev:amd64. > Preparing to unpack .../10-pacemaker-dev_2.0.1-1_amd64.deb ... > Unpacking pacemaker-dev:amd64 (2.0.1-1) ... > dpkg: error processing archive > /tmp/apt-dpkg-install-UW7jMV/10-pacemaker-dev_2.0.1-1_amd64.deb (--unpack): >trying to overwrite '/usr/include/pacemaker/crm/attrd.h', which is also in > package libcrmcluster1-dev 1.1.7-1 > dpkg-deb: error: paste subprocess was killed by signal (Broken pipe) > Errors were encountered while processing: >/tmp/apt-dpkg-install-UW7jMV/10-pacemaker-dev_2.0.1-1_amd64.deb Yep, all -dev packages were merged at one point into pacemaker-dev. Breaks+Replaces on old packages should do the trick here. -- Valentin
Bug#776246: Processed: severity of 776246 is grave
On Tue, Feb 19, 2019 at 10:26:09AM +0100, Christoph Martin wrote: > What can we do to not loose these packages (burp in my case)? > > librsync 2.0.2-1~exp1 was uploaded to experimental three days ago. csync2 seems to build fine with librsync2 from experimental so if you can upload that to unstable, maybe we can still save some of the affected packages. -- Valentin
Bug#776246: Processed: severity of 776246 is grave
On Tue, Feb 19, 2019 at 10:26:09AM +0100, Christoph Martin wrote: > What can we do to not loose these packages (burp in my case)? > > librsync 2.0.2-1~exp1 was uploaded to experimental three days ago. I guess librsync2 would need to go into unstable and testing. Than we can try to update our apps to the new API and also enter testing again. Not sure if this is realistic at this point in the release proces so that is why I suggested setting severity grave after buster is out. -- Valentin
Bug#776246: Processed: severity of 776246 is grave
Hi, Not sure why grave so late in the release process that we lose some packages (csync2 in my case)? grave after the release would give us more time to move to librsync2. -- Valentin
Bug#919901: [Debian-ha-maintainers] Bug#919901: Bug#919901: corosync-qnetd: fails to upgrade from 'stretch': certutil: Could not set password for the slot
On Thu, Jan 24, 2019 at 10:27:39PM +0100, Valentin Vidic wrote: > Password file indeed seems to be empty on stretch: > > drwxr-x--- 2 root coroqnetd 4096 Jan 24 22:22 . > drwxr-xr-x 3 root root 4096 Jan 24 22:22 .. > -rw-r- 1 root coroqnetd 65536 Jan 24 22:22 cert8.db > -rw-r- 1 root coroqnetd 16384 Jan 24 22:22 key3.db > -rw-r- 1 root root 41 Jan 24 22:22 noise.txt > -rw-r- 1 root root 0 Jan 24 22:22 pwdfile.txt > -rw-r--r-- 1 root root 4223 Jan 24 22:22 qnetd-cacert.crt > -rw-r- 1 root root 16384 Jan 24 22:22 secmod.db > -rw-r- 1 root root 4 Jan 24 22:22 serial.txt Seems the magic upgrade command is: # password file should have an empty line to be accepted test -f "$db/pwdfile.txt" -a ! -s "$db/pwdfile.txt" && echo > "$db/pwdfile.txt" certutil -N -d "sql:$db" -f "$db/pwdfile.txt" -@ "$db/pwdfile.txt" -- Valentin
Bug#919901: [Debian-ha-maintainers] Bug#919901: corosync-qnetd: fails to upgrade from 'stretch': certutil: Could not set password for the slot
On Sun, Jan 20, 2019 at 05:07:25PM +0100, Andreas Beckmann wrote: > Setting up corosync-qnetd (3.0.0-1) ... > password file contains no data > Invalid password. > certutil: Could not set password for the slot: SEC_ERROR_INVALID_ARGS: > security library: invalid arguments. > dpkg: error processing package corosync-qnetd (--configure): >installed corosync-qnetd package post-installation script subprocess > returned error exit status 255 > Processing triggers for libc-bin (2.28-5) ... > Errors were encountered while processing: >corosync-qnetd Password file indeed seems to be empty on stretch: drwxr-x--- 2 root coroqnetd 4096 Jan 24 22:22 . drwxr-xr-x 3 root root 4096 Jan 24 22:22 .. -rw-r- 1 root coroqnetd 65536 Jan 24 22:22 cert8.db -rw-r- 1 root coroqnetd 16384 Jan 24 22:22 key3.db -rw-r- 1 root root 41 Jan 24 22:22 noise.txt -rw-r- 1 root root 0 Jan 24 22:22 pwdfile.txt -rw-r--r-- 1 root root 4223 Jan 24 22:22 qnetd-cacert.crt -rw-r- 1 root root 16384 Jan 24 22:22 secmod.db -rw-r- 1 root root 4 Jan 24 22:22 serial.txt -- Valentin
Bug#918944: [Debian-ha-maintainers] Bug#918944: Autopkgtest failure with rails 5/rack 2
On Fri, Jan 11, 2019 at 12:32:05AM +0530, Pirate Praveen wrote: > Package: pcs > Version: 0.9.166-5 > Severity: serious > > https://ci.debian.net/packages/p/pcs/unstable/amd64 > > May be 0.10 version has a fix, it is delaying rails 5 testing > migration, so please fix it. Yep, I'm looking at 0.10.1, but it has a lot of changes so it might take a few more days to get it working. -- Valentin
Bug#911801: closing 911801
close 911801 thanks Trying to run the setup command always returns 401: # pcs cluster setup --name pacemaker1 stretch1 stretch2 Error: stretch1: unable to authenticate to node Error: stretch2: unable to authenticate to node Error: nodes availability check failed, use --force to override. WARNING: This will destroy existing cluster on the nodes. # pcs cluster setup --name pacemaker1 stretch1 stretch2 --force Destroying cluster on nodes: stretch1, stretch2... stretch1: Unable to authenticate to stretch1 - (HTTP error: 401), try running 'pcs cluster auth' stretch2: Unable to authenticate to stretch2 - (HTTP error: 401), try running 'pcs cluster auth' stretch2: Unable to authenticate to stretch2 - (HTTP error: 401), try running 'pcs cluster auth' stretch1: Unable to authenticate to stretch1 - (HTTP error: 401), try running 'pcs cluster auth' Error: unable to destroy cluster stretch2: Unable to authenticate to stretch2 - (HTTP error: 401), try running 'pcs cluster auth' stretch1: Unable to authenticate to stretch1 - (HTTP error: 401), try running 'pcs cluster auth' Also, even when running with --force, no file gets removed and I see no reason for severity grave and justification "causes non-serious data loss". Instructions in README.Debian should still work, so please try to use those for setting up pcs clusters. -- Valentin
Bug#911801: [Debian-ha-maintainers] Bug#911801: pacemaker: Cannot complete pcs cluster setup command, returns error HTTP401
On Thu, Oct 25, 2018 at 05:11:17PM +, Duncan Hare wrote: > root@greene:/home/duncan# pcs cluster setup --name pacemaker1 pinke greene > greene: Authorized > pinke: Authorizedroot@greene:/home/duncan#root@greene:/home/duncan# pcs > cluster setup --name pacemaker1 pinke greene --force > Destroying cluster on nodes: pinke, greene... > pinke: Unable to authenticate to pinke - (HTTP error: 401), try running 'pcs > cluster auth' > greene: Unable to authenticate to greene - (HTTP error: 401), try running > 'pcs cluster auth' > pinke: Unable to authenticate to pinke - (HTTP error: 401), try running 'pcs > cluster auth' > greene: Unable to authenticate to greene - (HTTP error: 401), try running > 'pcs cluster auth' > Error: unable to destroy cluster > greene: Unable to authenticate to greene - (HTTP error: 401), try running > 'pcs cluster auth' > pinke: Unable to authenticate to pinke - (HTTP error: 401), try running 'pcs > cluster auth' > root@greene:/home/duncan# > > this works: rm /etc/corosync/corosync.conf > > Debian Bug report logs - #847295 > pcs cluster setup does not overwrite existing config files, and the n the > cluster create fails. Yes, I think removing corosync.conf is documented in README.Debian: As PCS expects Corosync and Pacemaker to be in unconfigured state, the following command needs to be executed on all cluster nodes to stop the services and delete their default configuration: # pcs cluster destroy Shutting down pacemaker/corosync services... Killing any remaining services... Removing all cluster configuration files... -- Valentin
Bug#911801: [Debian-ha-maintainers] Bug#911801: pacemaker: Cannot complete pcs cluster setup command, returns error HTTP401
On Wed, Oct 24, 2018 at 05:19:02PM -0700, Duncan Hare wrote: > Package: pacemaker > Version: 1.1.16-1 > Severity: grave > Justification: causes non-serious data loss I've reassigned this to pcs package, since it probably doesn't have to do with pacemaker, but I'm not sure what is going on here. Can you provide some more info on the problem and pcs commands that where used so I can try to reproduce? Also, perhaps the README.Debian included in the pcs package could help if this is an initial installation of the cluster. -- Valentin
Bug#911177: [PATCH] dlm: Toplevel Makefile always returns success
Check exit codes from each of the subdirectories. --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index dd29bcea..ab069a1c 100644 --- a/Makefile +++ b/Makefile @@ -1,2 +1,2 @@ all install clean: %: - for d in libdlm dlm_controld dlm_tool fence; do $(MAKE) -C $$d $@; done + set -e; for d in libdlm dlm_controld dlm_tool fence; do $(MAKE) -C $$d $@; done -- 2.19.0
Bug#911177: [Debian-ha-maintainers] Bug#911177: dlm: does not trap errors from make
On Tue, Oct 16, 2018 at 09:22:22PM +0200, Helmut Grohne wrote: > The upstream Makefile runs submakes in a for loop without any error > trapping. Thus it continues building even in the presence of failures. > Doing so violates the Debian policy section 4.6. The recommended > solution is adding "set -e". Confirmed, I will send a patch to the upstream... -- Valentin
Bug#902117: [Debian-ha-maintainers] Bug#902117: corosync-qdevice will not daemonize/run
On Fri, Jul 06, 2018 at 12:50:42PM +0200, Ferenc Wágner wrote: > Thanks for the report. I've been pretty busy with other tasks, but I'll > check this out as soon as possible, your report isn't forgotten. I ask > for you patience till then. Feri, you still want to check this or should we close this issue? -- Valentin
Bug#902117: [Debian-ha-maintainers] Bug#902117: corosync-qdevice will not daemonize/run
On Fri, Jul 06, 2018 at 04:43:00PM -0400, Jason Gauthier wrote: > Now, it's entirely possible that I do have a configuration issue > causing corosync-qdevice to not start. However, the real issue is > that corosync-qdevice does not log anything to stdout when run with > "-f -d" (foreground, debug). Just tried this on unstable and you are right there is no output for "-f -d", but I do get this in the daemon.log: Jul 7 14:29:51 sid1 corosync-qdevice[1507]: Configuring qdevice Jul 7 14:29:51 sid1 corosync-qdevice[1507]: Can't read quorum.device.model cmap key. Jul 7 14:29:55 sid1 corosync-qdevice[1511]: Initializing votequorum Jul 7 14:29:55 sid1 corosync-qdevice[1511]: shm size:1048589; real_size:1052672; rb->word_size:263168 Jul 7 14:29:55 sid1 corosync-qdevice[1511]: shm size:1048589; real_size:1052672; rb->word_size:263168 Jul 7 14:29:55 sid1 corosync-qdevice[1511]: shm size:1048589; real_size:1052672; rb->word_size:263168 Jul 7 14:29:55 sid1 corosync-qdevice[1511]: Initializing local socket Jul 7 14:29:55 sid1 corosync-qdevice[1511]: Registering qdevice models Jul 7 14:29:55 sid1 corosync-qdevice[1511]: Configuring qdevice Jul 7 14:29:55 sid1 corosync-qdevice[1511]: Can't read quorum.device.model cmap key. Maybe stdout does not exist for this service or you need to tune this part of corosync.conf: logging { fileline: off to_stderr: no to_logfile: no logfile:/var/log/corosync/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } -- Valentin
Bug#902117: [Debian-ha-maintainers] Bug#902117: corosync-qdevice will not daemonize/run
On Fri, Jun 22, 2018 at 09:46:36AM -0400, Jason Gauthier wrote: > corosync-qdevice is a daemon that runs on each cluster node that help > provide a voting subsystem that utilizes corosync-qnet outside the > cluster. > > After installing the packages from debian stretch, and configuring the > application, it does not run. One can use -d and -f to troubleshoot > issues, and even in this situation no data is logged to the console, > or any syslog messages generated. The application immediately fails. corosync-qdevice is configured in corosync.conf, can you share the quorum block from there? If this is not configured I get the following error: Jul 6 16:08:08 node1 corosync-qdevice[2778]: Can't read quorum.device.model cmap key. But with a correct configuration it starts fine for me. -- Valentin
Bug#901100: [Debian-ha-maintainers] Bug#901100: cluster-glue-dev: missing Breaks+Replaces: cluster-glue (<< 1.0.12-8)
On Sat, Jun 09, 2018 at 02:41:24PM +0200, Ferenc Wágner wrote: > Are those .a (and .la) files really needed? We mostly avoid shipping > them, see: > http://www.debian.org/doc/debian-policy/ch-sharedlibs.html#s-sharedlibs-dev > https://www.debian.org/doc/manuals/maint-guide/advanced.en.html#library > https://wiki.debian.org/ReleaseGoals/LAFileRemoval Yes, we should probably investigate if those static libs and cluster-glue as a whole are used these days and remove what is not needed anymore... -- Valentin
Bug#900332: [Debian-ha-maintainers] Bug#900332: cluster-glue: FTBFS: error: unknown type name 'selector_t'; did you mean 'sel_timer_t'?
On Tue, May 29, 2018 at 09:54:13AM +0200, Emilio Pozuelo Monfort wrote: > Your package failed to build on a rebuild against libcurl4: > > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../include > -I../../../include -I../../../include -I../../../linux-ha -I../../../linux-ha > -I../../../libltdl -I../../../libltdl -Wdate-time -D_FORTIFY_SOURCE=2 > -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include > -I/usr/include/libxml2 -g -O2 -fdebug-prefix-map=/<>=. > -fstack-protector-strong -Wformat -Werror=format-security -ggdb > -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return > -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement > -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral > -Winline -Wmissing-prototypes -Wmissing-declarations > -Wmissing-format-attribute -Wnested-externs -Wno-long-long > -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings > -ansi -D_GNU_SOURCE -DANSI_ONLY -g -O2 -fdebug-prefix-map=/<>=. > -fstack-protector-strong -Wformat -Werror=format-security -ggdb > -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return > -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement > -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral > -Winline -Wmissing-prototypes -Wmissing-declarations > -Wmissing-format-attribute -Wnested-externs -Wno-long-long > -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings > -ansi -D_GNU_SOURCE -DANSI_ONLY -MT apcmaster.lo -MD -MP -MF > .deps/apcmaster.Tpo -c apcmaster.c -fPIC -DPIC -o .libs/apcmaster.o > ipmilan_command.c:52:1: error: unknown type name 'selector_t'; did you mean > 'sel_timer_t'? > selector_t *os_sel; > ^~ > sel_timer_t > ipmilan_command.c:87:16: error: unknown type name 'selector_t'; did you mean > 'sel_timer_t'? > void timed_out(selector_t *sel, sel_timer_t *timer, void *data); > ^~ > sel_timer_t > ipmilan_command.c:90:11: error: unknown type name 'selector_t'; did you mean > 'sel_timer_t'? > timed_out(selector_t *sel, sel_timer_t *timer, void *data) >^~ >sel_timer_t Yes, but this is probably due to a new openipmi version rather than libcurl4. Will check... -- Valentin
Bug#897508: [Debian-ha-maintainers] Bug#897508: Bug#897508: gfs2-utils: FTBFS: dh_auto_test: make -j8 -Oline check VERBOSE=1 returned exit code 2
On Wed, May 02, 2018 at 11:02:55PM +0200, Valentin Vidic wrote: > Thanks, I can reproduce the errors too and will try to figure it out. > The build was definitely working less than month ago when the last > version was released, so something else in the system has changed. After updating linux-libc-dev from 4.15.17-1 to 4.16.5-1 mkfs.gfs2 starts to segfault due to a change in gfs2_ondisk.h. I've contacted cluster-de...@redhat.com to see if they have a fix already... -- Valentin
Bug#897508: [Debian-ha-maintainers] Bug#897508: gfs2-utils: FTBFS: dh_auto_test: make -j8 -Oline check VERBOSE=1 returned exit code 2
On Wed, May 02, 2018 at 10:44:33PM +0200, Lucas Nussbaum wrote: > During a rebuild of all packages in sid, your package failed to build on > amd64. Thanks, I can reproduce the errors too and will try to figure it out. The build was definitely working less than month ago when the last version was released, so something else in the system has changed. -- Valentin
Bug#880554: [Pkg-xen-devel] Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Tue, Feb 27, 2018 at 08:22:50PM +0100, Valentin Vidic wrote: > Since I can't reproduce it easily anymore I suspect something was > fixed in the meanwhile. My original report was for 4.9.30-2+deb9u2 > and since then there seems to be a number of fixes that could be > related to this: Just rebooted both dom0 and domU with 4.9.30-2+deb9u2 and the the postgresql domU is having problems right away after boot: domid=1: nr_frames=32, max_nr_frames=32 [ 242.652100] INFO: task kworker/u90:0:6 blocked for more than 120 seconds. Upgrading the kernels and I can't get it above 11 anymore: domid=1: nr_frames=11, max_nr_frames=32 So some of those many kernel fixes did the trick and things just work fine with the newer kernels without raising gnttab_max_frames. -- Valentin
Bug#880554: [Pkg-xen-devel] Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Tue, Feb 27, 2018 at 05:05:06PM +0100, Hans van Kranenburg wrote: > ad 1. Christian, Valentin, can you give more specific info that can help > someone else to set up a test environment to trigger > 32 values. I can't touch the original VM that had this issue and tried to reproduce on another host with recent stretch kernels but without success. The maximum number I can get now is nr_frames=11. Another info that I forgot to mention before is that my VMs were using DRBD disks. Since DRBD acts like a slow disk it could cause IO requests to pile up and hit the limit faster. Since I can't reproduce it easily anymore I suspect something was fixed in the meanwhile. My original report was for 4.9.30-2+deb9u2 and since then there seems to be a number of fixes that could be related to this: linux (4.9.65-3) stretch; urgency=medium * xen/time: do not decrease steal time after live migration on xen linux (4.9.65-1) stretch; urgency=medium - swiotlb-xen: implement xen_swiotlb_dma_mmap callback - xen-netback: Use GFP_ATOMIC to allocate hash - xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap() - xen/manage: correct return value check on xenbus_scanf() - xen: don't print error message in case of missing Xenstore entry - xen/netback: set default upper limit of tx/rx queues to 8 linux (4.9.47-1) stretch; urgency=medium - nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() - nvme: avoid to use blk_mq_abort_requeue_list() - efi: Don't issue error message when booted under Xen - xen/privcmd: Support correctly 64KB page granularity when mapping memory - xen/blkback: fix disconnect while I/Os in flight - xen/blkback: don't use xen_blkif_get() in xen-blkback kthread - xen/blkback: don't free be structure too early - xen-netback: fix memory leaks on XenBus disconnect - xen-netback: protect resource cleaning on XenBus disconnect - swiotlb-xen: update dev_addr after swapping pages - xen-netfront: Fix Rx stall during network stress and OOM - [x86] mm: Fix flush_tlb_page() on Xen - xen-netfront: Rework the fix for Rx stall during OOM and network stress - xen/scsiback: Fix a TMR related use-after-free - [x86] xen: allow userspace access during hypercalls - [armhf] Xen: Zero reserved fields of xatp before making hypervisor call - xen-netback: correctly schedule rate-limited queues - nbd: blk_mq_init_queue returns an error code on failure, not NULL - xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511) - blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL - xen-blkfront: use a right index when checking requests linux (4.9.30-2+deb9u4) stretch-security; urgency=high * xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511) linux (4.9.30-2+deb9u3) stretch-security; urgency=high * xen-blkback: don't leak stack data via response ring * (CVE-2017-10911) * mqueue: fix a use-after-free in sys_mq_notify() (CVE-2017-11176) In fact the original big VM with this problem runs happily with: domid=1: nr_frames=11, max_nr_frames=256 so it is quite possible raising the limit is not needed anymore with the latest stretch kernels. If no-one else can reproduce this anymore I suggest you close the issue but include the xen-diag tool in the updated package. That way if someone reports the problem again it should be easy to detect. -- Valentin
Bug#888730: [Debian-ha-maintainers] Bug#888730: booth binary-all FTBFS: test failure
On Mon, Jan 29, 2018 at 11:16:51AM +0200, Adrian Bunk wrote: > Source: booth > Version: 1.01.0-5 > Severity: serious > > https://buildd.debian.org/status/fetch.php?pkg=booth&arch=all&ver=1.0-5&stamp=1516927397&raw=0 > > ... > FAILED (failures=1) The unit tests enabled in the last release are failing for unknown reason on some architectures. I will try to move them from build time to debian/tests instead. -- Valentin
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Mon, Jan 15, 2018 at 11:12:03AM +0100, Christian Schwamborn wrote: > Is there a easy way to get/monitor the used 'grants' frames? As I understand > it, the xen-diag tool you mentioned doesn't compile in xen 4.8? Here is a status from another host: domid=0: nr_frames=4, max_nr_frames=256 domid=487: nr_frames=6, max_nr_frames=256 domid=488: nr_frames=5, max_nr_frames=256 domid=489: nr_frames=4, max_nr_frames=256 domid=490: nr_frames=6, max_nr_frames=256 domid=491: nr_frames=7, max_nr_frames=256 domid=492: nr_frames=4, max_nr_frames=256 domid=493: nr_frames=4, max_nr_frames=256 domid=494: nr_frames=29, max_nr_frames=256 domid=495: nr_frames=4, max_nr_frames=256 domid=496: nr_frames=4, max_nr_frames=256 domid=497: nr_frames=5, max_nr_frames=256 domid=498: nr_frames=4, max_nr_frames=256 domid=499: nr_frames=4, max_nr_frames=256 domid=500: nr_frames=4, max_nr_frames=256 domid=501: nr_frames=4, max_nr_frames=256 domid=503: nr_frames=5, max_nr_frames=256 domid=572: nr_frames=13, max_nr_frames=256 domid=575: nr_frames=7, max_nr_frames=256 Most of the hosts have older kernels and nr_frames < 10. And than 494 has a stretch kernel and only 4 vcpus but is quite close to the current default of 32. Maybe it just depends on the amount of disk IO? -- Valentin
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Mon, Jan 15, 2018 at 11:12:03AM +0100, Christian Schwamborn wrote: > Is there a easy way to get/monitor the used 'grants' frames? As I understand > it, the xen-diag tool you mentioned doesn't compile in xen 4.8? I just gave it another try and after modifying xen-diag.c a bit to work with 4.8 here is what I get: # ./xen-diag gnttab_query_size 0 domid=0: nr_frames=4, max_nr_frames=256 # ./xen-diag gnttab_query_size 1 domid=1: nr_frames=11, max_nr_frames=256 # ./xen-diag gnttab_query_size 0 domid=0: nr_frames=4, max_nr_frames=256 # ./xen-diag gnttab_query_size 1 domid=1: nr_frames=11, max_nr_frames=256 # ./xen-diag gnttab_query_size 5 domid=5: nr_frames=11, max_nr_frames=256 so currently at 11, not high at all. Attaching a patch for stretch xen package if you want to check your hosts. -- Valentin --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -31,6 +31,7 @@ INSTALL_SBIN += xenpm INSTALL_SBIN += xenwatchdogd INSTALL_SBIN += xen-livepatch +INSTALL_SBIN += xen-diag INSTALL_SBIN += $(INSTALL_SBIN-y) # Everything to be installed in a private bin/ @@ -98,6 +99,9 @@ xen-livepatch: xen-livepatch.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) +xen-diag: xen-diag.o + $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) + xen-lowmemd: xen-lowmemd.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS) --- /dev/null +++ b/tools/misc/xen-diag.c @@ -0,0 +1,129 @@ +/* + * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +static xc_interface *xch; + +#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0])) + +void show_help(void) +{ +fprintf(stderr, +"xen-diag: xen diagnostic utility\n" +"Usage: xen-diag command [args]\n" +"Commands:\n" +" help display this help\n" +" gnttab_query_size dump the current and max grant frames for \n"); +} + +/* wrapper function */ +static int help_func(int argc, char *argv[]) +{ +show_help(); +return 0; +} + +static int gnttab_query_size_func(int argc, char *argv[]) +{ +int domid, rc = 1; +struct gnttab_query_size query; + +if ( argc != 1 ) +{ +show_help(); +return rc; +} + +domid = strtol(argv[0], NULL, 10); +query.dom = domid; +rc = xc_gnttab_op(xch, GNTTABOP_query_size, &query, sizeof(query), 1); + +if ( rc == 0 && (query.status == GNTST_okay) ) +printf("domid=%d: nr_frames=%d, max_nr_frames=%d\n", + query.dom, query.nr_frames, query.max_nr_frames); + +return rc == 0 && (query.status == GNTST_okay) ? 0 : 1; +} + +struct { +const char *name; +int (*function)(int argc, char *argv[]); +} main_options[] = { +{ "help", help_func }, +{ "gnttab_query_size", gnttab_query_size_func}, +}; + +int main(int argc, char *argv[]) +{ +int ret, i; + +/* + * Set stdout to be unbuffered to avoid having to fflush when + * printing without a newline. + */ +setvbuf(stdout, NULL, _IONBF, 0); + +if ( argc <= 1 ) +{ +show_help(); +return 0; +} + +for ( i = 0; i < ARRAY_SIZE(main_options); i++ ) +if ( !strncmp(main_options[i].name, argv[1], strlen(argv[1])) ) +break; + +if ( i == ARRAY_SIZE(main_options) ) +{ +show_help(); +return 0; +} +else +{ +xch = xc_interface_open(0, 0, 0); +if ( !xch ) +{ +fprintf(stderr, "failed to get the handler\n"); +return 0; +} + +ret = main_options[i].function(argc - 2, argv + 2); + +xc_interface_close(xch); +} + +/* + * Exitcode 0 for success. + * Exitcode 1 for an error. + * Exitcode 2 if the operation should be retried for any reason (e.g. a + * timeout or because another operation was in progress). + */ + +#define EXIT_TIMEOUT (EXIT_FAILURE + 1) + +BUILD_BUG_ON(EXIT_SUCCESS != 0); +BUILD_BUG_ON(EXIT_FAILURE != 1); +BUILD_BUG_ON(EXIT_TIMEOUT != 2); + +switch ( ret ) +{ +case 0: +return EXIT_SUCCESS; +case EAGAIN: +case EBUSY: +return EXIT_TIMEOUT; +default: +return EXIT_FAILURE; +} +}
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Fri, Jan 12, 2018 at 01:34:10AM +0100, Hans van Kranenburg wrote: > Is the 59 your lots-o-vcpu-monster? Yes, that is the one with a larger vcpu count. > I just finished with the initial preparation of a Xen 4.10 package for > unstable and have it running in my test environment. Unrelated to this issue, but can you tell me if there is a way to mitigate Meltdown with the Xen 4.8 dom0/domU(PV) running stretch? > Since this has been reported multiple times already, and upstream has > bumped it to 64, my verdict would be: > > * Bump default to 64 already like upstream did in a later version. > * Properly document this issue in NEWS.Debian and also mention the > option with documentation in the template grub config file, so there's a > bigger chance users who run unusual big numbers of disks/nics/cpus/etc > will find it. > > ...so we also better accomodate users who are using newer kernels in the > domU with blk-mq, and prevent them from wasting too much time and > getting frustrated for no reason. > > I wouldn't be comfortable with bumping it above the current latest > greatest upstream default, since it would mean we would need to keep a > patch in later versions. > > I'll prepare a patch to bump the default to 64 in 4.8, taking changes > from the upstream patch. I probably have to ask upstream (Juergen Gross) > why the commit that was referenced earlier bumps the default without > mentioning it in the commit message. Thanks, 64 should be a good start. If there are still problems reported with that it can be reconsidered. -- Valentin
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Sun, Jan 07, 2018 at 07:36:40PM +0100, Hans van Kranenburg wrote: > Recently a tool was added to "dump guest grant table info". You could > see if it compiles on the 4.8 source and see if it works? Would be > interesting to get some idea about how high or low these numbers are in > different scenarios. I mean, I'm using 128, you 256, and we even don't > know if the actual value is maybe just above 32? :] > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=df36d82e3fc91bee2ff1681fd438c815fa324b6a The diag tool does not build inside xen-4.8: xen-diag.c: In function ‘gnttab_query_size_func’: xen-diag.c:50:10: error: implicit declaration of function ‘xc_gnttab_query_size’ [-Werror=implicit-function-declaration] rc = xc_gnttab_query_size(xch, &query); ^~~~ but I think the same info is available in the thread on xen-devel: https://www.mail-archive.com/xen-devel@lists.xen.org/msg116910.html When the domU hangs crash reports nr_grant_frames=32. After increasing the gnttab_max_frames=256 the domU reports using nr_grant_frames=59. So the new default of gnttab_max_frames=64 might be a bit close to 59, but I suppose 128 would be just as safe as 256 I currently use (if you prefer 128). > If this is something users are going to run into while not doing more > unusual things like having dozens of vcpus or network interfaces, then > changing the default could prevent hours of frustration and debugging > for them. Yes, the failure case is quite nasty, as the domU just hangs without even suggesting grant frames might be the problem. Not sure if domU can detect this situation at all? Anyway, if the value cannot be increased, the situation should at least be mentioned in the NEWS.Debian of the xen package. -- Valentin
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Sat, Jan 06, 2018 at 11:17:00PM +0100, Hans van Kranenburg wrote: > I agree that the upstream default, 32 is quite low. This is indeed a > configuration issue. I myself ran into this years ago with a growing > number of domUs and network interfaces in use. We have been using > gnttab_max_nr_frames=128 for a long time already instead. > > I was tempted to reassign src:xen, but in the meantime, this option has > already been removed again, so this bug does not apply to unstable > (well, as soon as we get something new in there) any more (as far as I > can see quickly now). > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=18b1be5e324bcbe2f10898b116db641d404b3d30 It does not seem to be removed but increased the default from 32 to 64? > Including a better default for gnttab_max_nr_frames in the grub config > in the debian xen package in stable sounds reasonable from a best > practices point of view. > > But, I would be interested in learning more about the relation with > block mq although. Does using newer linux kernels (like from > stretch-backports) for the domU always put a bigger strain on this? Or, > is it just related to the overall number of network devices and block > devices you are adding to your domUs in your specific own situation, and > did you just trip over the default limit? After upgrading the domU and dom0 from jessie to stretch on a big postgresql database server (50 VCPUs, 200GB RAM) it starting freezing very soon after boot as posted there here: https://lists.xen.org/archives/html/xen-users/2017-07/msg00057.html It did not have these problems while running jessie versions of the hypervisor and the kernels. The problem seems to be related to the number of CPUs used, as smaller domUs with a few VCPUs did not hang like this. Could it be that large number of VCPUs -> more queues in Xen mq driver -> faster exhaustion of allocated pages? -- Valentin
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
On Sat, Jan 06, 2018 at 03:08:26PM +0100, Yves-Alexis Perez wrote: > According to that link, the fix seems to be configuration rather than code. > Does this mean this bug against the kernel should be closed? Yes, the problem seems to be in the Xen hypervisor and not the Linux kernel itself. The default value for the gnttab_max_frames parameter needs to be increased to avoid domU disk IO hangs, for example: GRUB_CMDLINE_XEN="dom0_mem=10240M gnttab_max_frames=256" So either close the bug or reassign it to xen-hypervisor package so they can increase the default value for this parameter in the hypervisor code. -- Valentin
Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64
Hi, The problem seems to be caused by the new multi-queue xen blk driver and I was advised by the Xen devs to increase the gnttab_max_frames=256 parameter for the hypervisor. This has solved the blocking issue for me and it has been running without problems for a few months now. I/O to LUNs hang / stall under high load when using xen-blkfront https://www.novell.com/support/kb/doc.php?id=7018590 -- Valentin
Bug#753235: closing 753235
close 753235 thanks
Bug#869986: [Debian-ha-maintainers] Bug#869986: Bug#869986: pacemaker FTBFS: missing symbols
On Mon, Aug 07, 2017 at 02:31:57PM -0400, Ferenc Wágner wrote: > There's no problem with the Pacemaker libs, the "missing" symbols are a > manifestation of the binutils incompatibility in the libqb headers. Ok, didn't realize the pacemaker FTBFS was caused by the libqb problem. Even better, than we only have one nasty bug to squash :) -- Valentin
Bug#869986: [Debian-ha-maintainers] Bug#869986: Bug#869986: pacemaker FTBFS: missing symbols
On Mon, Aug 07, 2017 at 09:31:22AM -0400, Ferenc Wágner wrote: > Absolutely, thanks for this very good find, Valentin! These symbols > caused problems on non-x86 architectures before, and now libqb is broken > for good (so we should probably merge this into #871153). Let's see > what upstream comes up with. Till now i couldn't wrap my head around > the orphan section linker magic, now their struggle might shed some > light on the point of all this... Right, the upstream is having problems with libqb, but maybe they don't see the problem with pacemaker libs if they are not checking the exported symbols. Do you know if these start/stop symbols were used anywhere or it would be safe to drop them from the pacemaker libs? #MISSING: 1.1.17-1# (arch=!powerpc !powerpcspe !ppc64 !ppc64el)__start___verbose@Base 1.1.12 #MISSING: 1.1.17-1# (arch=!powerpc !powerpcspe !ppc64 !ppc64el)__stop___verbose@Base 1.1.12 -- Valentin
Bug#869986: [Debian-ha-maintainers] Bug#869986: pacemaker FTBFS: missing symbols
On Fri, Jul 28, 2017 at 04:14:47PM +0300, Adrian Bunk wrote: > Source: pacemaker > Version: 1.1.17-1 > Severity: serious > > Some recent change in unstable makes pacemaker FTBFS: > > https://tests.reproducible-builds.org/debian/history/pacemaker.html > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/pacemaker.html Seems to be related to binutils 2.29 problem reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1477354 -- Valentin
Bug#857368: [Debian-ha-maintainers] Bug#857368: heartbeat: Heartbeat-Package ist missing package "net-tools" as depency due the need of command "ifconfig".
On Fri, Mar 10, 2017 at 04:31:33PM +0100, Ronny Schneider wrote: > as i installed the heartbeat-package in Debian Stretch, heartbeat failed to > set the ip adresses, since the command "ifconfig" cannot be run. This command > is part of the "net-tools"-Package and thus needed until heartbeat is patched > to use the new "ip" command instead. Until this isn't done, the package > "net-tools" needs to be referenced as a depency. I can reproduce the problem, but the error is coming from resource-agents package (/usr/lib/ocf/resource.d/heartbeat/IPaddr) so I will reassign there. Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: info: Acquiring resource group: sid1 192.168.122.200 Mar 11 16:32:04 sid1 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.122.200)[17519]: INFO: Resource is stopped Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: info: Running /etc/ha.d//resource.d/IPaddr 192.168.122.200 start Mar 11 16:32:04 sid1 IPaddr(IPaddr_192.168.122.200)[17573]: ERROR: Setup problem: couldn't find command: ifconfig Mar 11 16:32:04 sid1 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.122.200)[17561]: ERROR: Program is not installed Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: ERROR: Return code 5 from /etc/ha.d//resource.d/IPaddr Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: CRIT: Giving up resources due to failure of 192.168.122.200 Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: info: Releasing resource group: sid1 192.168.122.200 -- Valentin
Bug#818961: [Debian-ha-maintainers] Bug#818961: Freeze status, Heartbeat plans
On Wed, Dec 21, 2016 at 03:32:39PM +0100, Valentin Vidic wrote: > node1 IPaddr2::192.168.122.101/24/ens3 drbddisk::drbd0 LVM::cluster > Filesystem::/dev/cluster/srv::/srv::ext4 mysql > apache::/etc/apache2/apache2.conf Also found the following problem in resource-agents but again not related to systemd :) https://github.com/ClusterLabs/resource-agents/pull/905 -- Valentin
Bug#818961: [Debian-ha-maintainers] Freeze status, Heartbeat plans
On Wed, Dec 21, 2016 at 02:24:00PM +0100, Patrick Matthäi wrote: > DRBD+lvm+ext4, apache and mariadb should be enough IMHO this setup is too complex for v1, but even that seem to work for me: node1 IPaddr2::192.168.122.101/24/ens3 drbddisk::drbd0 LVM::cluster Filesystem::/dev/cluster/srv::/srv::ext4 mysql apache::/etc/apache2/apache2.conf The only problem I found is during reboot drbd service is not started and I can't enable it but this might be an issue with drbd-utils package: # systemctl enable drbd drbd.service is not a native service, redirecting to systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable drbd update-rc.d: error: drbd Default-Start contains no runlevels, aborting. -- Valentin
Bug#818961: [Debian-ha-maintainers] Freeze status, Heartbeat plans
On Wed, Dec 21, 2016 at 01:15:12PM +0100, Patrick Matthäi wrote: > We tried out many test cases, workarounds, debugging on this issue and > the result was, that the v1 mode of heartbeat can not deal with the > dependency system of systemd and will never support it. Not sure what could be the problem with systemd dependencies. Do you remember which services were involved so we can do a quick test? -- Valentin
Bug#796638: [Debian-ha-maintainers] Bug#796638: Patches
On Wed, Jul 06, 2016 at 04:53:05PM +0200, Christian Hofstaedtler wrote: > Yes, but that is still better than not working at all. Ok, I'll try to create a service file for the o2cb cluster stack that is not too complicated. The pacemaker cluster stack does not need the service files as it starts controld and ocfs2 filesystem from resource scripts. -- Valentin
Bug#823583: gfs2-utils: FTBFS: libgfs2 unit tests [..] FAILED (libgfs2.at:4)
On Fri, May 06, 2016 at 10:21:33AM +0100, Chris Lamb wrote: > libgfs2 unit tests > >24: meta.c FAILED (libgfs2.at:4) >25: rgrp.c ok The build test fails for me on unstable too due to a 2 byte increase in a structure size. Was there some recent compiler or lib change in unstable that would have caused this? ## - ## ## Test results. ## ## - ## ERROR: All 25 tests were run, 1 failed unexpectedly. ## ## ## Summary of the failures. ## ## ## Failed tests: gfs2-utils master test suite test groups: NUM: FILE-NAME:LINE TEST-GROUP-NAME KEYWORDS 24: libgfs2.at:3 meta.c libgfs2 ## -- ## ## Detailed failed tests. ## ## -- ## # -*- compilation -*- 24. libgfs2.at:3: testing meta.c ... ./libgfs2.at:3: test x"$ENABLE_UNIT_TESTS" = "xyes" || exit 77 ./libgfs2.at:4: check_meta stderr: gfs2_dirent: __pad: offset is 28, expected 26 gfs2_dirent: size mismatch between struct 40 and fields 38 stdout: Running suite(s): libgfs2 0%: Checks: 1, Failures: 1, Errors: 0 check_meta.c:9:F:Meta:test_lgfs2_meta:0: Assertion 'lgfs2_selfcheck() == 0' failed ./libgfs2.at:4: exit code was 1, expected 0 24. libgfs2.at:3: 24. meta.c (libgfs2.at:3): FAILED (libgfs2.at:4) -- Valentin
Bug#711628: libhttp-daemon-ssl-perl: FTBFS: test failure
On Sat, Jun 08, 2013 at 01:02:56PM +0100, Dominic Hargreaves wrote: > Source: libhttp-daemon-ssl-perl > Version: 1.04-3 > Severity: serious > Justification: FTBFS > > This package FTBFS (in a clean sid sbuild session): > > Can't call method "get_request" on an undefined value at t/testmodule.t line > 90. > t/testmodule.t .. > Dubious, test returned 255 (wstat 65280, 0xff00) > Failed 1/2 test programs. 2/10 subtests failed. > Failed 4/9 subtests The package seems to work, but the test case needs some tweaks to work with more recent SSL libs (patch attached). -- Valentin --- t/testmodule.t 2008-02-12 02:27:01.0 +0100 +++ t/testmodule2.t 2015-09-29 14:07:23.792135915 +0200 @@ -36,6 +36,7 @@ $client = new IO::Socket::SSL(PeerAddr => $SSL_SERVER_ADDR, PeerPort => $SSL_SERVER_PORT, + SSL_version => 'TLSv1', SSL_verify_mode => 0x01, SSL_ca_file => "certs/test-ca.pem"); @@ -58,7 +59,7 @@ Timeout => 30, ReuseAddr => 1, SSL_verify_mode => 0x00, - SSL_ca_file => "certs/test-ca.pem", + SSL_key_file => "certs/server-key.pem", SSL_cert_file => "certs/server-cert.pem"); if (!$server) {
Bug#698118: asterisk 1:1.6.2.9-2+squeeze9 segfaults
Same thing here, started segfaulting after an upgrade this morning: 2013-01-14 10:32:13 upgrade asterisk 1:1.6.2.9-2+squeeze8 1:1.6.2.9-2+squeeze9 [783312.661049] asterisk[27654]: segfault at 1 ip b748db77 sp b5319684 error 4 in libc-2.11.3.so[b7418000+14] [783442.211589] asterisk[13070]: segfault at 1 ip b74a0b77 sp b532d684 error 4 in libc-2.11.3.so[b742b000+14] [787731.493578] asterisk[13304]: segfault at 1 ip b74b6b77 sp b5344684 error 4 in libc-2.11.3.so[b7441000+14] [787933.505841] asterisk[937]: segfault at 1 ip b750fb77 sp b539c684 error 4 in libc-2.11.3.so[b749a000+14] [788010.077989] asterisk[2168]: segfault at 1 ip b745db77 sp b52eb684 error 4 in libc-2.11.3.so[b73e8000+14] [788592.836440] asterisk[2359]: segfault at 1 ip b7550b77 sp b53dc684 error 4 in libc-2.11.3.so[b74db000+14] [792704.434687] asterisk[6096]: segfault at 1 ip b746cb77 sp b52fa684 error 4 in libc-2.11.3.so[b73f7000+14] [793003.009440] asterisk[25102]: segfault at 1 ip b751db77 sp b53a9684 error 4 in libc-2.11.3.so[b74a8000+14] -- Valentin -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org