Bug#414092: airport-utils: Tools start and quit immediately without working

2023-03-05 Thread Valentin Vidic
I just did a quick check and the tools work for me on the current stable
(11.6) and unstable.

The only time they don't start is when X11 is not available like in an ssh
session:

$ java -verbose:class -jar 
/usr/share/java/airport-utils/AirportBaseStationConfig.jar
...
[0.170s][info][class,load] sun.awt.MostRecentKeyValue source: jrt:/java.desktop
[0.170s][info][class,load] sun.awt.PostEventQueue source: jrt:/java.desktop
[0.171s][info][class,load] java.util.Vector source: jrt:/java.base
[0.171s][info][class,load] java.awt.Window$Type source: jrt:/java.desktop
[0.171s][info][class,load] java.lang.UnsupportedOperationException source: 
jrt:/java.base
[0.171s][info][class,load] java.awt.HeadlessException source: jrt:/java.desktop
[0.171s][info][class,load] java.util.IdentityHashMap$IdentityHashMapIterator 
source: shared objects file
[0.171s][info][class,load] java.util.IdentityHashMap$KeyIterator source: shared 
objects file
[0.171s][info][class,load] java.lang.Shutdown source: shared objects file
[0.171s][info][class,load] java.lang.Shutdown$Lock source: shared objects file

I suppose the startup scripts could somehow check if X11 is not available
and print a warning?

-- 
Valentin



Bug#1018930: [Debian-ha-maintainers] Bug#1018930: marked as done (pcs: CVE-2022-2735: Obtaining an authentication token for hacluster user leads to privilege escalation)

2022-09-07 Thread Valentin Vidic
I checked pcs 0.10.1-2 in buster and it turns out it is not vulnerable
to CVE-2022-2735. Separate ruby daemon with a world writable UNIX socket
was introduced later in 0.10.5:

https://salsa.debian.org/ha-team/pcs/-/commits/master/pcsd/pcsd-ruby.service.in

Before that version python code runs ruby commands and they communicate
by sending json responses on stdin/stdout.

https://salsa.debian.org/ha-team/pcs/-/blob/38330deb0d849d6a1945856b24323043f6a7839b/pcs/daemon/ruby_pcsd.py

-- 
Valentin



Bug#1008379: closing 1008379

2022-05-17 Thread Valentin Vidic
close 1008379 1.5.1-1
thanks

Tested the build of the new package release in sbuild
sid chroot and did not see any problems.

-- 
Valentin



Bug#994418: ocfs2-tools: failing autopkgtest on one of ci.d.n amd64 workers

2021-09-16 Thread Valentin Vidic
Hi Paul,

On Thu, Sep 16, 2021 at 08:34:06AM +0200, Paul Gevers wrote:
> It was pointed out to me on IRC that the mount of /tmp with `nodev` is
> probably the issue here. I'm discussion if we should just drop that.

The failing test does not use a device so this probably won't help. I
tried updating the test to use losetup, but it turns out losetup does
not work with lxc.

It seems that O_DIRECT on tmpfs is a know problem and other software
like mysql also doesn't work on tmpfs. There were some kernel patches
to allow O_DIRECT on tmpfs, but they were probably not accepted.

Perhaps it be possible not to use tmpfs for $AUTOPKGTEST_TMP, or was that
the goal in the first place?

-- 
Valentin



Bug#994418: ocfs2-tools: failing autopkgtest on one of ci.d.n amd64 workers

2021-09-15 Thread Valentin Vidic
On Wed, Sep 15, 2021 at 09:24:08PM +0200, Paul Gevers wrote:
> I looked at the results of the autopkgtest of you package on amd64
> because with a recent upload of glibc the autopkgtest of ocfs2-tools
> fails in testing. It seems to me that the failures are related to the
> worker that the test runs on. ci-worker13 fails, while the other workers
> are OK. We recently changed the setup of ci-worker13, to have /tmp/ of
> the host on tmpfs as that speeds up testing considerably is a lot of
> cases. I copied some of the output at the bottom of this report, but I'm
> not 100% sure that the /tmp there (the one inside the lxc testbed) *is*
> on tmpfs.
> 
> Don't hesitate to contact us at debian...@lists.debian.org if you need
> help debugging this issue.
> 
> Paul
> 
> https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocfs2-tools/15277216/log.gz
> 
> 
> autopkgtest [19:14:22]: test basic: [---
> 
> === disk ===
> 200+0 records in
> 200+0 records out
> 209715200 bytes (210 MB, 200 MiB) copied, 0.109005 s, 1.9 GB/s
> 
> === mkfs ===
> mkfs.ocfs2 1.8.6
> mkfs.ocfs2: Could not open device
> /tmp/autopkgtest-lxc.8neywhcx/downtmp/autopkgtest_tmp/disk: Invalid argument
> autopkgtest [19:14:23]: test basic: ---]

Yes, tmpfs seems to be the problem since it doesn't support O_DIRECT that
is being requested here:

static void
open_device(State *s)
{
s->fd = open64(s->device_name, O_RDWR | O_DIRECT);

if (s->fd == -1) {
com_err(s->progname, 0,
"Could not open device %s: %s",
s->device_name, strerror (errno));
exit(1);
}
}

-- 
Valentin



Bug#987441: s390x installation bugs

2021-08-02 Thread Valentin Vidic
On Sun, Aug 01, 2021 at 09:45:00PM +0200, Valentin Vidic wrote:
> Thanks, that does sound similar to what I was getting there. I will try
> to see if it still happens with the latest installer. And since it
> crashes on start I had no way to access the logs or dmesg of the
> machine. Perhaps there is some installer option to help debug this kind
> of thing?

Just tested the rc3 installation with qemu-system-s390x. Installation
went fairly quickly and without any problems. Great work everyone and
happy release :)))

-- 
Valentin



Bug#987441: s390x installation bugs

2021-08-01 Thread Valentin Vidic
On Sun, Aug 01, 2021 at 05:10:07PM +0200, Cyril Brulebois wrote:
> Valentin Vidic  (2021-08-01):
> > No problem, I was not able to reproduce this reliably or get a core
> > dump for this crash. It could just be an emulation problem with qemu
> > or some timing issue for the first installer step. If there is no
> > update on this problem I think we can even close it for now.
> 
> Speaking of the first step, did anyone mention #987368 before, now fixed
> in udpkg?

Thanks, that does sound similar to what I was getting there. I will try
to see if it still happens with the latest installer. And since it
crashes on start I had no way to access the logs or dmesg of the
machine. Perhaps there is some installer option to help debug this kind
of thing?

-- 
Valentin



Bug#987441: s390x installation bugs

2021-05-03 Thread Valentin Vidic
On Mon, May 03, 2021 at 08:36:58AM +0200, Cyril Brulebois wrote:
> Also adding to the list of bugs to keep an eye on (again, possibly not
> blocking the release on its being resolved; we could have the issue
> listed in errata, and possibly fixed in a point release).

Thanks, here is another one for s390x, should be relatively simple if
you wish to link it here:

linux: Debian installation fails in qemu-system-s390x due to missing virtio_blk 
module
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988005

-- 
Valentin



Bug#987441: Bug#926539: Bug#987441: s390x installation bugs

2021-05-03 Thread Valentin Vidic
On Mon, May 03, 2021 at 08:58:02AM +0200, John Paul Adrian Glaubitz wrote:
> > The same issue exists on s390x but isn't apparently going to get fixed
> > so we need to have d-i be smarter (hence the merge request)?
> 
> Seems so.

QEMU console might get fixed in the kernel, but it looks like LPAR could
have a similar problem (don't have access to test this). So it seems
better (and future proof) to fix this on the Debian side too. I have
updated the merge request to trigger the new code only on s390x as
suggested:

https://salsa.debian.org/installer-team/rootskel/-/merge_requests/2

> > I'd suggest at least retitling the bug report to mention s390x (release
> > arch, affected) instead of sparc64 (port arch, no longer affected), to
> > lower the chances people could overlook this issue, thinking it's only
> > about a port arch.
> 
> We could also unmerge #926539 and #961056 again, then close the former bug
> which was sparc64-specific.

I have unmerged the bugs now, so the sparc one can be closed.

-- 
Valentin



Bug#987441: s390x installation bugs

2021-05-02 Thread Valentin Vidic
Hi,

Probably not critical, but maybe these installation bugs on s390x could
be fixed for the release?

rootskel: steal-ctty no longer works on at least sparc64
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926539

debian-installer: qemu-system-s390x installation fails due to segfault in 
main-menu
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987788

-- 
Valentin



Bug#987351: claim bug

2021-04-24 Thread Valentin Vidic
user debian-rele...@lists.debian.org
usertags -1 + bsp-2021-04-AT-Salzburg   
thank you

-- 
Valentin



Bug#975543: fence-agents autopkg tests time out

2020-11-23 Thread Valentin Vidic
On Mon, Nov 23, 2020 at 11:46:54AM +0100, Matthias Klose wrote:
> Package: src:fence-agents
> Version: 4.6.0-2
> Severity: serious
> Tags: sid bullseye
> User: debian-pyt...@lists.debian.org
> Usertags: python3.9
> 
> fence-agents autopkg tests time out, might not be Python 3.9 specific.

Yup, this should be a dash wait hang tracked in #974705.

-- 
Valentin



Bug#974705: fence-agents test hangs

2020-11-14 Thread Valentin Vidic
One of autopkgtests in fence-agents package seems to be broken
in the same way - just hangs in wait forever and using bash works:

https://salsa.debian.org/ha-team/fence-agents/-/blob/master/debian/tests/delay

-- 
Valentin



Bug#945881: bgpdump: Segmentation fault

2019-11-30 Thread Valentin Vidic
Package: bgpdump
Version: 1.6.0-1
Severity: grave

Dear Maintainer,

The program segfaults when started:

$ bgpdump 
Segmentation fault

Based on gdb info it seems like the call to log_to_stderr fails:

(gdb) bt
#0  0x2246 in ?? ()
#1  0x77fef5cf in main ()

(gdb) disas main
Dump of assembler code for function main:
   0x77fef5a0 <+0>: push   %r15
   0x77fef5a2 <+2>: xor%r15d,%r15d
   0x77fef5a5 <+5>: push   %r14
   0x77fef5a7 <+7>: mov$0x1,%r14d
   0x77fef5ad <+13>:push   %r13
   0x77fef5af <+15>:lea0xa055(%rip),%r13# 0x77ff960b
   0x77fef5b6 <+22>:push   %r12
   0x77fef5b8 <+24>:mov%rsi,%r12
   0x77fef5bb <+27>:push   %rbp
   0x77fef5bc <+28>:mov%edi,%ebp
   0x77fef5be <+30>:push   %rbx
   0x77fef5bf <+31>:lea0xa7c2(%rip),%rbx# 0x77ff9d88
   0x77fef5c6 <+38>:sub$0x18,%rsp
=> 0x77fef5ca <+42>:callq  0x77fef240 

-- System Information:
Debian Release: 10.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-6-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages bgpdump depends on:
ii  libbsd0 0.9.1-2
ii  libbz2-1.0  1.0.6-9.2~deb10u1
ii  libc6   2.28-10
ii  zlib1g  1:1.2.11.dfsg-1

bgpdump recommends no packages.

bgpdump suggests no packages.

-- no debconf information



Bug#935562: ITP: google-auth-httplib2 -- Google Authentication Library: httplib2 transport

2019-09-30 Thread Valentin Vidic
Package: wnpp
Severity: wishlist
Owner: Valentin Vidic 

* Package name: google-auth-httplib2
  Version : 0.0.3
  Upstream Author : Google Cloud Platform 
* URL : 
https://github.com/GoogleCloudPlatform/google-auth-library-python-httplib2
* License : Apache 2.0
  Programming Lang: Python
  Description : Google Authentication Library: httplib2 transport

Python library providing a httplib2 transport for google-auth.
This library is intended to help existing users of oauth2client migrate
to google-auth.

The intent of this package is to be used together with the existing
python3-googleapi package (see #935562). The package will be maintained
by the DPMT on salsa.



Bug#934519: stretch-pu: package fence-agents/4.0.25-1+deb9u2

2019-09-29 Thread Valentin Vidic
Package: release.debian.org
Severity: normal
Tags: stretch
User: release.debian@packages.debian.org
Usertags: pu

Hi,

Please allow an update for fence-agents package fixing occasional FTBFS
reported in #934519. Patch for the change follows below.

diff -Nru fence-agents-4.0.25/debian/changelog 
fence-agents-4.0.25/debian/changelog
--- fence-agents-4.0.25/debian/changelog2019-06-30 19:01:55.0 
+0200
+++ fence-agents-4.0.25/debian/changelog2019-09-29 12:27:01.0 
+0200
@@ -1,3 +1,9 @@
+fence-agents (4.0.25-1+deb9u2) stretch; urgency=medium
+
+  * Update patch for removing fence_amt_ws (Closes: #934519)
+
+ -- Valentin Vidic   Sun, 29 Sep 2019 12:27:01 +0200
+
 fence-agents (4.0.25-1+deb9u1) stretch; urgency=medium
 
   * fence_rhevm: add patch for CVE-2019-10153 (Closes: #930887)
diff -Nru fence-agents-4.0.25/debian/patches/remove-fence_amt_ws 
fence-agents-4.0.25/debian/patches/remove-fence_amt_ws
--- fence-agents-4.0.25/debian/patches/remove-fence_amt_ws  2019-06-30 
19:01:55.0 +0200
+++ fence-agents-4.0.25/debian/patches/remove-fence_amt_ws  2019-09-29 
12:27:01.0 +0200
@@ -1,16 +1,16 @@
 --- a/configure.ac
 +++ b/configure.ac
-@@ -142,6 +142,9 @@ if test "x$AGENTS_LIST" = xall; then
-   AGENTS_LIST=`find $srcdir/fence/agents -mindepth 2 -maxdepth 2 -name 
'*.py' -printf '%P ' | sed -e 's#lib/[A-Za-z_.]* ##g' -e 
's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'`
+@@ -139,7 +139,8 @@
+ fi
+ 
+ if test "x$AGENTS_LIST" = xall; then
+-  AGENTS_LIST=`find $srcdir/fence/agents -mindepth 2 -maxdepth 2 -name 
'*.py' -printf '%P ' | sed -e 's#lib/[A-Za-z_.]* ##g' -e 
's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'`
++  # remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) 
in Debian
++  AGENTS_LIST=`find $srcdir/fence/agents -mindepth 2 -maxdepth 2 -name 
'*.py' ! -name 'fence_amt_ws.py' -printf '%P ' | sed -e 's#lib/[A-Za-z_.]* ##g' 
-e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 's#autodetect/[A-Za-z_.]* ##g'`
  fi
  
-+# remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) in 
Debian
-+AGENTS_LIST=$(echo $AGENTS_LIST | sed -e "s!amt_ws/fence_amt_ws.py !!")
-+
  XENAPILIB=0
- if echo "$AGENTS_LIST" | grep -q xenapi; then
-   XENAPILIB=1
-@@ -163,7 +166,8 @@ AC_PYTHON_MODULE(suds, 1)
+@@ -163,7 +164,8 @@
  AC_PYTHON_MODULE(pexpect, 1)
  AC_PYTHON_MODULE(pycurl, 1)
  AC_PYTHON_MODULE(requests, 1)



Bug#934519: buster-pu: package fence-agents/4.3.3-2+deb10u1

2019-09-29 Thread Valentin Vidic
Package: release.debian.org
Severity: normal
Tags: buster
User: release.debian@packages.debian.org
Usertags: pu

Hi,

Please allow an update for fence-agents package fixing ocassional FTBFS
reported in #934519. Patch for the change follows below.

diff -Nru fence-agents-4.3.3/debian/changelog 
fence-agents-4.3.3/debian/changelog
--- fence-agents-4.3.3/debian/changelog 2019-06-23 19:53:35.0 +0200
+++ fence-agents-4.3.3/debian/changelog 2019-09-29 11:54:16.0 +0200
@@ -1,3 +1,9 @@
+fence-agents (4.3.3-2+deb10u1) buster; urgency=medium
+
+  * Update patch for removing fence_amt_ws (Closes: #934519)
+
+ -- Valentin Vidic   Sun, 29 Sep 2019 11:54:16 +0200
+
 fence-agents (4.3.3-2) unstable; urgency=high
 
   * fence_rhevm: add patch for CVE-2019-10153 (Closes: #930887)
diff -Nru fence-agents-4.3.3/debian/patches/remove-fence_amt_ws 
fence-agents-4.3.3/debian/patches/remove-fence_amt_ws
--- fence-agents-4.3.3/debian/patches/remove-fence_amt_ws   2018-10-06 
22:30:46.0 +0200
+++ fence-agents-4.3.3/debian/patches/remove-fence_amt_ws   2019-09-29 
11:52:14.0 +0200
@@ -6,13 +6,13 @@
 This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
 --- a/configure.ac
 +++ b/configure.ac
-@@ -176,6 +176,9 @@
-   AGENTS_LIST=`find $srcdir/agents -mindepth 2 -maxdepth 2 -name 
'fence_*.py' -print0 | xargs -0 | sed -e 's#[^ ]*/agents/##g' -e 
's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 
's#autodetect/[A-Za-z_.]* ##g'`
+@@ -175,7 +175,8 @@
+ fi
+ 
+ if test "x$AGENTS_LIST" = xall; then
+-  AGENTS_LIST=`find $srcdir/agents -mindepth 2 -maxdepth 2 -name 
'fence_*.py' -print0 | xargs -0 | sed -e 's#[^ ]*/agents/##g' -e 
's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 
's#autodetect/[A-Za-z_.]* ##g'`
++  # remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) 
in Debian
++  AGENTS_LIST=`find $srcdir/agents -mindepth 2 -maxdepth 2 -name 
'fence_*.py' ! -name fence_amt_ws.py -print0 | xargs -0 | sed -e 's#[^ 
]*/agents/##g' -e 's#lib/[A-Za-z_.]* ##g' -e 's#nss_wrapper/[A-Za-z_.]* ##g' -e 
's#autodetect/[A-Za-z_.]* ##g'`
  fi
  
-+# remove fence_amt_ws because we don't have openwsman (and sblim-sfcc) in 
Debian
-+AGENTS_LIST=$(echo $AGENTS_LIST | sed -e "s!amt_ws/fence_amt_ws.py !!")
-+
  XENAPILIB=0
- if echo "$AGENTS_LIST" | grep -q xenapi; then
-   XENAPILIB=1



Bug#925354: [Debian-ha-maintainers] Bug#925354: pacemaker-dev: missing Breaks+Replaces: libcrmcluster1-dev

2019-03-25 Thread Valentin Vidic
On Mon, Mar 25, 2019 at 03:45:58PM +0100, Andreas Beckmann wrote:
> In that case you should probably add Breaks+Replaces against all of the
> old -dev packages that were merged, just to be on the safe side.

Yes, that is the plan. I think wferi will take care of it if he
has time?

-- 
Valentin



Bug#925354: [Debian-ha-maintainers] Bug#925354: pacemaker-dev: missing Breaks+Replaces: libcrmcluster1-dev

2019-03-25 Thread Valentin Vidic
On Sat, Mar 23, 2019 at 05:19:59PM +0100, Andreas Beckmann wrote:
> during a test with piuparts I noticed your package fails to upgrade from
> 'wheezy' to 'jessie' to 'stretch' to 'buster'.
> It installed fine in 'wheezy', and upgraded to 'jessie' and 'stretch'
> successfully,
> but then the upgrade to 'buster' failed.
> 
> In case the package was not part of an intermediate stable release,
> the version from the preceding stable release was kept installed.
> 
> From the attached log (scroll to the bottom...):
> 
>   Selecting previously unselected package pacemaker-dev:amd64.
>   Preparing to unpack .../10-pacemaker-dev_2.0.1-1_amd64.deb ...
>   Unpacking pacemaker-dev:amd64 (2.0.1-1) ...
>   dpkg: error processing archive 
> /tmp/apt-dpkg-install-UW7jMV/10-pacemaker-dev_2.0.1-1_amd64.deb (--unpack):
>trying to overwrite '/usr/include/pacemaker/crm/attrd.h', which is also in 
> package libcrmcluster1-dev 1.1.7-1
>   dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
>   Errors were encountered while processing:
>/tmp/apt-dpkg-install-UW7jMV/10-pacemaker-dev_2.0.1-1_amd64.deb

Yep, all -dev packages were merged at one point into pacemaker-dev.
Breaks+Replaces on old packages should do the trick here.

-- 
Valentin



Bug#776246: Processed: severity of 776246 is grave

2019-02-21 Thread Valentin Vidic
On Tue, Feb 19, 2019 at 10:26:09AM +0100, Christoph Martin wrote:
> What can we do to not loose these packages (burp in my case)?
> 
> librsync  2.0.2-1~exp1 was uploaded to experimental three days ago.

csync2 seems to build fine with librsync2 from experimental so if
you can upload that to unstable, maybe we can still save some of
the affected packages.

-- 
Valentin



Bug#776246: Processed: severity of 776246 is grave

2019-02-19 Thread Valentin Vidic
On Tue, Feb 19, 2019 at 10:26:09AM +0100, Christoph Martin wrote:
> What can we do to not loose these packages (burp in my case)?
> 
> librsync  2.0.2-1~exp1 was uploaded to experimental three days ago.

I guess librsync2 would need to go into unstable and testing. Than
we can try to update our apps to the new API and also enter testing
again. Not sure if this is realistic at this point in the release
proces so that is why I suggested setting severity grave after
buster is out.

-- 
Valentin



Bug#776246: Processed: severity of 776246 is grave

2019-02-18 Thread Valentin Vidic
Hi,

Not sure why grave so late in the release process that we lose
some packages (csync2 in my case)? grave after the release would
give us more time to move to librsync2.

-- 
Valentin



Bug#919901: [Debian-ha-maintainers] Bug#919901: Bug#919901: corosync-qnetd: fails to upgrade from 'stretch': certutil: Could not set password for the slot

2019-01-24 Thread Valentin Vidic
On Thu, Jan 24, 2019 at 10:27:39PM +0100, Valentin Vidic wrote:
> Password file indeed seems to be empty on stretch:
> 
> drwxr-x--- 2 root coroqnetd  4096 Jan 24 22:22 .
> drwxr-xr-x 3 root root   4096 Jan 24 22:22 ..
> -rw-r- 1 root coroqnetd 65536 Jan 24 22:22 cert8.db
> -rw-r- 1 root coroqnetd 16384 Jan 24 22:22 key3.db
> -rw-r- 1 root root 41 Jan 24 22:22 noise.txt
> -rw-r- 1 root root  0 Jan 24 22:22 pwdfile.txt
> -rw-r--r-- 1 root root   4223 Jan 24 22:22 qnetd-cacert.crt
> -rw-r- 1 root root  16384 Jan 24 22:22 secmod.db
> -rw-r- 1 root root  4 Jan 24 22:22 serial.txt

Seems the magic upgrade command is:

  # password file should have an empty line to be accepted
  test -f "$db/pwdfile.txt" -a ! -s "$db/pwdfile.txt" && echo > 
"$db/pwdfile.txt"
  certutil -N -d "sql:$db" -f "$db/pwdfile.txt" -@ "$db/pwdfile.txt"

-- 
Valentin



Bug#919901: [Debian-ha-maintainers] Bug#919901: corosync-qnetd: fails to upgrade from 'stretch': certutil: Could not set password for the slot

2019-01-24 Thread Valentin Vidic
On Sun, Jan 20, 2019 at 05:07:25PM +0100, Andreas Beckmann wrote:
>   Setting up corosync-qnetd (3.0.0-1) ...
>   password file contains no data
>   Invalid password.
>   certutil: Could not set password for the slot: SEC_ERROR_INVALID_ARGS: 
> security library: invalid arguments.
>   dpkg: error processing package corosync-qnetd (--configure):
>installed corosync-qnetd package post-installation script subprocess 
> returned error exit status 255
>   Processing triggers for libc-bin (2.28-5) ...
>   Errors were encountered while processing:
>corosync-qnetd

Password file indeed seems to be empty on stretch:

drwxr-x--- 2 root coroqnetd  4096 Jan 24 22:22 .
drwxr-xr-x 3 root root   4096 Jan 24 22:22 ..
-rw-r- 1 root coroqnetd 65536 Jan 24 22:22 cert8.db
-rw-r- 1 root coroqnetd 16384 Jan 24 22:22 key3.db
-rw-r- 1 root root 41 Jan 24 22:22 noise.txt
-rw-r- 1 root root  0 Jan 24 22:22 pwdfile.txt
-rw-r--r-- 1 root root   4223 Jan 24 22:22 qnetd-cacert.crt
-rw-r- 1 root root  16384 Jan 24 22:22 secmod.db
-rw-r- 1 root root  4 Jan 24 22:22 serial.txt

-- 
Valentin



Bug#918944: [Debian-ha-maintainers] Bug#918944: Autopkgtest failure with rails 5/rack 2

2019-01-10 Thread Valentin Vidic
On Fri, Jan 11, 2019 at 12:32:05AM +0530, Pirate Praveen wrote:
> Package: pcs
> Version: 0.9.166-5
> Severity: serious
> 
> https://ci.debian.net/packages/p/pcs/unstable/amd64
> 
> May be 0.10 version has a fix, it is delaying rails 5 testing
> migration, so please fix it.

Yep, I'm looking at 0.10.1, but it has a lot of changes so it might take
a few more days to get it working.

-- 
Valentin



Bug#911801: closing 911801

2018-11-12 Thread Valentin Vidic
close 911801 
thanks

Trying to run the setup command always returns 401:

# pcs cluster setup --name pacemaker1 stretch1 stretch2
Error: stretch1: unable to authenticate to node
Error: stretch2: unable to authenticate to node
Error: nodes availability check failed, use --force to override. WARNING: This 
will destroy existing cluster on the nodes.

# pcs cluster setup --name pacemaker1 stretch1 stretch2 --force
Destroying cluster on nodes: stretch1, stretch2...
stretch1: Unable to authenticate to stretch1 - (HTTP error: 401), try running 
'pcs cluster auth'
stretch2: Unable to authenticate to stretch2 - (HTTP error: 401), try running 
'pcs cluster auth'
stretch2: Unable to authenticate to stretch2 - (HTTP error: 401), try running 
'pcs cluster auth'
stretch1: Unable to authenticate to stretch1 - (HTTP error: 401), try running 
'pcs cluster auth'
Error: unable to destroy cluster
stretch2: Unable to authenticate to stretch2 - (HTTP error: 401), try running 
'pcs cluster auth'
stretch1: Unable to authenticate to stretch1 - (HTTP error: 401), try running 
'pcs cluster auth'

Also, even when running with --force, no file gets removed and I see no
reason for severity grave and justification "causes non-serious data loss".

Instructions in README.Debian should still work, so please try to use
those for setting up pcs clusters.

-- 
Valentin



Bug#911801: [Debian-ha-maintainers] Bug#911801: pacemaker: Cannot complete pcs cluster setup command, returns error HTTP401

2018-10-25 Thread Valentin Vidic
On Thu, Oct 25, 2018 at 05:11:17PM +, Duncan Hare wrote:
> root@greene:/home/duncan# pcs cluster setup --name pacemaker1 pinke greene
> greene: Authorized
> pinke: Authorizedroot@greene:/home/duncan#root@greene:/home/duncan# pcs 
> cluster setup --name pacemaker1 pinke greene --force
> Destroying cluster on nodes: pinke, greene...
> pinke: Unable to authenticate to pinke - (HTTP error: 401), try running 'pcs 
> cluster auth'
> greene: Unable to authenticate to greene - (HTTP error: 401), try running 
> 'pcs cluster auth'
> pinke: Unable to authenticate to pinke - (HTTP error: 401), try running 'pcs 
> cluster auth'
> greene: Unable to authenticate to greene - (HTTP error: 401), try running 
> 'pcs cluster auth'
> Error: unable to destroy cluster
> greene: Unable to authenticate to greene - (HTTP error: 401), try running 
> 'pcs cluster auth'
> pinke: Unable to authenticate to pinke - (HTTP error: 401), try running 'pcs 
> cluster auth'
> root@greene:/home/duncan#
> 
> this works: rm /etc/corosync/corosync.conf
> 
> Debian Bug report logs - #847295
> pcs cluster setup does not overwrite existing config files, and the n the 
> cluster create fails.

Yes, I think removing corosync.conf is documented in README.Debian:

As PCS expects Corosync and Pacemaker to be in unconfigured state,
the following command needs to be executed on all cluster nodes to
stop the services and delete their default configuration:

  # pcs cluster destroy
  Shutting down pacemaker/corosync services...
  Killing any remaining services...
  Removing all cluster configuration files...

-- 
Valentin



Bug#911801: [Debian-ha-maintainers] Bug#911801: pacemaker: Cannot complete pcs cluster setup command, returns error HTTP401

2018-10-25 Thread Valentin Vidic
On Wed, Oct 24, 2018 at 05:19:02PM -0700, Duncan Hare wrote:
> Package: pacemaker
> Version: 1.1.16-1
> Severity: grave
> Justification: causes non-serious data loss

I've reassigned this to pcs package, since it probably doesn't have to
do with pacemaker, but I'm not sure what is going on here. Can you
provide some more info on the problem and pcs commands that where used
so I can try to reproduce?

Also, perhaps the README.Debian included in the pcs package could help
if this is an initial installation of the cluster.

-- 
Valentin



Bug#911177: [PATCH] dlm: Toplevel Makefile always returns success

2018-10-16 Thread Valentin Vidic
Check exit codes from each of the subdirectories.
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index dd29bcea..ab069a1c 100644
--- a/Makefile
+++ b/Makefile
@@ -1,2 +1,2 @@
 all install clean: %:
-   for d in libdlm dlm_controld dlm_tool fence; do $(MAKE) -C $$d $@; done
+   set -e; for d in libdlm dlm_controld dlm_tool fence; do $(MAKE) -C $$d 
$@; done
-- 
2.19.0



Bug#911177: [Debian-ha-maintainers] Bug#911177: dlm: does not trap errors from make

2018-10-16 Thread Valentin Vidic
On Tue, Oct 16, 2018 at 09:22:22PM +0200, Helmut Grohne wrote:
> The upstream Makefile runs submakes in a for loop without any error
> trapping. Thus it continues building even in the presence of failures.
> Doing so violates the Debian policy section 4.6. The recommended
> solution is adding "set -e".

Confirmed, I will send a patch to the upstream...

-- 
Valentin



Bug#902117: [Debian-ha-maintainers] Bug#902117: corosync-qdevice will not daemonize/run

2018-07-23 Thread Valentin Vidic
On Fri, Jul 06, 2018 at 12:50:42PM +0200, Ferenc Wágner wrote:
> Thanks for the report.  I've been pretty busy with other tasks, but I'll
> check this out as soon as possible, your report isn't forgotten.  I ask
> for you patience till then.

Feri, you still want to check this or should we close this issue?

-- 
Valentin



Bug#902117: [Debian-ha-maintainers] Bug#902117: corosync-qdevice will not daemonize/run

2018-07-07 Thread Valentin Vidic
On Fri, Jul 06, 2018 at 04:43:00PM -0400, Jason Gauthier wrote:
> Now, it's entirely possible that I do have a configuration issue
> causing corosync-qdevice to not start.  However, the real issue is
> that corosync-qdevice does not log anything to stdout when run with
> "-f -d"  (foreground, debug).

Just tried this on unstable and you are right there is no output
for "-f -d", but I do get this in the daemon.log:

Jul  7 14:29:51 sid1 corosync-qdevice[1507]: Configuring qdevice
Jul  7 14:29:51 sid1 corosync-qdevice[1507]: Can't read quorum.device.model 
cmap key.
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: Initializing votequorum
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: shm size:1048589; 
real_size:1052672; rb->word_size:263168
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: shm size:1048589; 
real_size:1052672; rb->word_size:263168
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: shm size:1048589; 
real_size:1052672; rb->word_size:263168
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: Initializing local socket
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: Registering qdevice models
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: Configuring qdevice
Jul  7 14:29:55 sid1 corosync-qdevice[1511]: Can't read quorum.device.model 
cmap key.

Maybe stdout does not exist for this service or you need to
tune this part of corosync.conf:

logging {
fileline:   off
to_stderr:  no
to_logfile: no
logfile:/var/log/corosync/corosync.log
to_syslog:  yes
debug:  off
timestamp:  on
logger_subsys {
subsys: QUORUM
debug:  off
}
}

-- 
Valentin



Bug#902117: [Debian-ha-maintainers] Bug#902117: corosync-qdevice will not daemonize/run

2018-07-06 Thread Valentin Vidic
On Fri, Jun 22, 2018 at 09:46:36AM -0400, Jason Gauthier wrote:
> corosync-qdevice is a daemon that runs on each cluster node that help
> provide a voting subsystem that utilizes corosync-qnet outside the
> cluster.
> 
> After installing the packages from debian stretch, and configuring the
> application, it does not run.  One can use -d and -f to troubleshoot
> issues, and even in this situation no data is logged to the console,
> or any syslog messages generated.  The application immediately fails.

corosync-qdevice is configured in corosync.conf, can you share the
quorum block from there?

If this is not configured I get the following error:

Jul  6 16:08:08 node1 corosync-qdevice[2778]: Can't read quorum.device.model 
cmap key.

But with a correct configuration it starts fine for me.

-- 
Valentin



Bug#901100: [Debian-ha-maintainers] Bug#901100: cluster-glue-dev: missing Breaks+Replaces: cluster-glue (<< 1.0.12-8)

2018-06-09 Thread Valentin Vidic
On Sat, Jun 09, 2018 at 02:41:24PM +0200, Ferenc Wágner wrote:
> Are those .a (and .la) files really needed?  We mostly avoid shipping
> them, see:
> http://www.debian.org/doc/debian-policy/ch-sharedlibs.html#s-sharedlibs-dev
> https://www.debian.org/doc/manuals/maint-guide/advanced.en.html#library
> https://wiki.debian.org/ReleaseGoals/LAFileRemoval

Yes, we should probably investigate if those static libs and
cluster-glue as a whole are used these days and remove what is
not needed anymore...

-- 
Valentin



Bug#900332: [Debian-ha-maintainers] Bug#900332: cluster-glue: FTBFS: error: unknown type name 'selector_t'; did you mean 'sel_timer_t'?

2018-05-29 Thread Valentin Vidic
On Tue, May 29, 2018 at 09:54:13AM +0200, Emilio Pozuelo Monfort wrote:
> Your package failed to build on a rebuild against libcurl4:
> 
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../include 
> -I../../../include -I../../../include -I../../../linux-ha -I../../../linux-ha 
> -I../../../libltdl -I../../../libltdl -Wdate-time -D_FORTIFY_SOURCE=2 
> -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include 
> -I/usr/include/libxml2 -g -O2 -fdebug-prefix-map=/<>=. 
> -fstack-protector-strong -Wformat -Werror=format-security -ggdb 
> -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return 
> -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement 
> -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral 
> -Winline -Wmissing-prototypes -Wmissing-declarations 
> -Wmissing-format-attribute -Wnested-externs -Wno-long-long 
> -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings 
> -ansi -D_GNU_SOURCE -DANSI_ONLY -g -O2 -fdebug-prefix-map=/<>=. 
> -fstack-protector-strong -Wformat -Werror=format-security -ggdb 
> -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return 
> -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement 
> -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral 
> -Winline -Wmissing-prototypes -Wmissing-declarations 
> -Wmissing-format-attribute -Wnested-externs -Wno-long-long 
> -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings 
> -ansi -D_GNU_SOURCE -DANSI_ONLY -MT apcmaster.lo -MD -MP -MF 
> .deps/apcmaster.Tpo -c apcmaster.c  -fPIC -DPIC -o .libs/apcmaster.o
> ipmilan_command.c:52:1: error: unknown type name 'selector_t'; did you mean 
> 'sel_timer_t'?
>  selector_t *os_sel;
>  ^~
>  sel_timer_t
> ipmilan_command.c:87:16: error: unknown type name 'selector_t'; did you mean 
> 'sel_timer_t'?
>  void timed_out(selector_t *sel, sel_timer_t *timer, void *data);
> ^~
> sel_timer_t
> ipmilan_command.c:90:11: error: unknown type name 'selector_t'; did you mean 
> 'sel_timer_t'?
>  timed_out(selector_t  *sel, sel_timer_t *timer, void *data)
>^~
>sel_timer_t

Yes, but this is probably due to a new openipmi version rather than
libcurl4.  Will check...

-- 
Valentin



Bug#897508: [Debian-ha-maintainers] Bug#897508: Bug#897508: gfs2-utils: FTBFS: dh_auto_test: make -j8 -Oline check VERBOSE=1 returned exit code 2

2018-05-04 Thread Valentin Vidic
On Wed, May 02, 2018 at 11:02:55PM +0200, Valentin Vidic wrote:
> Thanks, I can reproduce the errors too and will try to figure it out.
> The build was definitely working less than month ago when the last
> version was released, so something else in the system has changed.

After updating linux-libc-dev from 4.15.17-1 to 4.16.5-1 mkfs.gfs2
starts to segfault due to a change in gfs2_ondisk.h. I've contacted
cluster-de...@redhat.com to see if they have a fix already...

-- 
Valentin



Bug#897508: [Debian-ha-maintainers] Bug#897508: gfs2-utils: FTBFS: dh_auto_test: make -j8 -Oline check VERBOSE=1 returned exit code 2

2018-05-02 Thread Valentin Vidic
On Wed, May 02, 2018 at 10:44:33PM +0200, Lucas Nussbaum wrote:
> During a rebuild of all packages in sid, your package failed to build on
> amd64.

Thanks, I can reproduce the errors too and will try to figure it out.
The build was definitely working less than month ago when the last
version was released, so something else in the system has changed.

-- 
Valentin



Bug#880554: [Pkg-xen-devel] Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-02-27 Thread Valentin Vidic
On Tue, Feb 27, 2018 at 08:22:50PM +0100, Valentin Vidic wrote:
> Since I can't reproduce it easily anymore I suspect something was
> fixed in the meanwhile.  My original report was for 4.9.30-2+deb9u2
> and since then there seems to be a number of fixes that could be
> related to this:

Just rebooted both dom0 and domU with 4.9.30-2+deb9u2 and the the
postgresql domU is having problems right away after boot:

  domid=1: nr_frames=32, max_nr_frames=32

  [  242.652100] INFO: task kworker/u90:0:6 blocked for more than 120 seconds.

Upgrading the kernels and I can't get it above 11 anymore:

  domid=1: nr_frames=11, max_nr_frames=32

So some of those many kernel fixes did the trick and things just
work fine with the newer kernels without raising gnttab_max_frames.

-- 
Valentin



Bug#880554: [Pkg-xen-devel] Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-02-27 Thread Valentin Vidic
On Tue, Feb 27, 2018 at 05:05:06PM +0100, Hans van Kranenburg wrote:
> ad 1. Christian, Valentin, can you give more specific info that can help
> someone else to set up a test environment to trigger > 32 values.

I can't touch the original VM that had this issue and tried to
reproduce on another host with recent stretch kernels but without
success.  The maximum number I can get now is nr_frames=11.

Another info that I forgot to mention before is that my VMs were
using DRBD disks. Since DRBD acts like a slow disk it could cause
IO requests to pile up and hit the limit faster.

Since I can't reproduce it easily anymore I suspect something was
fixed in the meanwhile.  My original report was for 4.9.30-2+deb9u2
and since then there seems to be a number of fixes that could be
related to this:

linux (4.9.65-3) stretch; urgency=medium
  * xen/time: do not decrease steal time after live migration on xen
linux (4.9.65-1) stretch; urgency=medium
- swiotlb-xen: implement xen_swiotlb_dma_mmap callback
- xen-netback: Use GFP_ATOMIC to allocate hash
- xen/gntdev: avoid out of bounds access in case of partial
  gntdev_mmap()
- xen/manage: correct return value check on xenbus_scanf()
- xen: don't print error message in case of missing Xenstore entry
- xen/netback: set default upper limit of tx/rx queues to 8
linux (4.9.47-1) stretch; urgency=medium
- nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
- nvme: avoid to use blk_mq_abort_requeue_list()
- efi: Don't issue error message when booted under Xen
- xen/privcmd: Support correctly 64KB page granularity when mapping
  memory
- xen/blkback: fix disconnect while I/Os in flight
- xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
- xen/blkback: don't free be structure too early
- xen-netback: fix memory leaks on XenBus disconnect
- xen-netback: protect resource cleaning on XenBus disconnect
- swiotlb-xen: update dev_addr after swapping pages
- xen-netfront: Fix Rx stall during network stress and OOM
- [x86] mm: Fix flush_tlb_page() on Xen
- xen-netfront: Rework the fix for Rx stall during OOM and network
  stress
- xen/scsiback: Fix a TMR related use-after-free
- [x86] xen: allow userspace access during hypercalls
- [armhf] Xen: Zero reserved fields of xatp before making hypervisor
  call
- xen-netback: correctly schedule rate-limited queues
- nbd: blk_mq_init_queue returns an error code on failure, not NULL
- xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511)
- blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL
- xen-blkfront: use a right index when checking requests
linux (4.9.30-2+deb9u4) stretch-security; urgency=high
  * xen: fix bio vec merging (CVE-2017-12134) (Closes: #866511)
linux (4.9.30-2+deb9u3) stretch-security; urgency=high
  * xen-blkback: don't leak stack data via response ring
  * (CVE-2017-10911)
  * mqueue: fix a use-after-free in sys_mq_notify() (CVE-2017-11176)

In fact the original big VM with this problem runs happily with:

  domid=1: nr_frames=11, max_nr_frames=256

so it is quite possible raising the limit is not needed anymore
with the latest stretch kernels.

If no-one else can reproduce this anymore I suggest you close the
issue but include the xen-diag tool in the updated package.  That
way if someone reports the problem again it should be easy to detect.

-- 
Valentin



Bug#888730: [Debian-ha-maintainers] Bug#888730: booth binary-all FTBFS: test failure

2018-01-29 Thread Valentin Vidic
On Mon, Jan 29, 2018 at 11:16:51AM +0200, Adrian Bunk wrote:
> Source: booth
> Version: 1.01.0-5
> Severity: serious
> 
> https://buildd.debian.org/status/fetch.php?pkg=booth&arch=all&ver=1.0-5&stamp=1516927397&raw=0
> 
> ...
> FAILED (failures=1)

The unit tests enabled in the last release are failing for unknown
reason on some architectures.  I will try to move them from build
time to debian/tests instead.

-- 
Valentin



Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-01-15 Thread Valentin Vidic
On Mon, Jan 15, 2018 at 11:12:03AM +0100, Christian Schwamborn wrote:
> Is there a easy way to get/monitor the used 'grants' frames? As I understand
> it, the xen-diag tool you mentioned doesn't compile in xen 4.8?

Here is a status from another host:

domid=0: nr_frames=4, max_nr_frames=256
domid=487: nr_frames=6, max_nr_frames=256
domid=488: nr_frames=5, max_nr_frames=256
domid=489: nr_frames=4, max_nr_frames=256
domid=490: nr_frames=6, max_nr_frames=256
domid=491: nr_frames=7, max_nr_frames=256
domid=492: nr_frames=4, max_nr_frames=256
domid=493: nr_frames=4, max_nr_frames=256
domid=494: nr_frames=29, max_nr_frames=256
domid=495: nr_frames=4, max_nr_frames=256
domid=496: nr_frames=4, max_nr_frames=256
domid=497: nr_frames=5, max_nr_frames=256
domid=498: nr_frames=4, max_nr_frames=256
domid=499: nr_frames=4, max_nr_frames=256
domid=500: nr_frames=4, max_nr_frames=256
domid=501: nr_frames=4, max_nr_frames=256
domid=503: nr_frames=5, max_nr_frames=256
domid=572: nr_frames=13, max_nr_frames=256
domid=575: nr_frames=7, max_nr_frames=256

Most of the hosts have older kernels and nr_frames < 10.

And than 494 has a stretch kernel and only 4 vcpus but is quite close to
the current default of 32.  Maybe it just depends on the amount of disk IO?

-- 
Valentin



Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-01-15 Thread Valentin Vidic
On Mon, Jan 15, 2018 at 11:12:03AM +0100, Christian Schwamborn wrote:
> Is there a easy way to get/monitor the used 'grants' frames? As I understand
> it, the xen-diag tool you mentioned doesn't compile in xen 4.8?

I just gave it another try and after modifying xen-diag.c
a bit to work with 4.8 here is what I get:

  # ./xen-diag gnttab_query_size 0
  domid=0: nr_frames=4, max_nr_frames=256
  # ./xen-diag gnttab_query_size 1
  domid=1: nr_frames=11, max_nr_frames=256
  
  # ./xen-diag  gnttab_query_size 0
  domid=0: nr_frames=4, max_nr_frames=256
  # ./xen-diag  gnttab_query_size 1
  domid=1: nr_frames=11, max_nr_frames=256
  # ./xen-diag  gnttab_query_size 5
  domid=5: nr_frames=11, max_nr_frames=256

so currently at 11, not high at all.

Attaching a patch for stretch xen package if you want to check
your hosts.

-- 
Valentin
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -31,6 +31,7 @@
 INSTALL_SBIN   += xenpm
 INSTALL_SBIN   += xenwatchdogd
 INSTALL_SBIN   += xen-livepatch
+INSTALL_SBIN   += xen-diag
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -98,6 +99,9 @@
 xen-livepatch: xen-livepatch.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-diag: xen-diag.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
--- /dev/null
+++ b/tools/misc/xen-diag.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+static xc_interface *xch;
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
+
+void show_help(void)
+{
+fprintf(stderr,
+"xen-diag: xen diagnostic utility\n"
+"Usage: xen-diag command [args]\n"
+"Commands:\n"
+"  help   display this help\n"
+"  gnttab_query_size   dump the current and max grant frames for \n");
+}
+
+/* wrapper function */
+static int help_func(int argc, char *argv[])
+{
+show_help();
+return 0;
+}
+
+static int gnttab_query_size_func(int argc, char *argv[])
+{
+int domid, rc = 1;
+struct gnttab_query_size query;
+
+if ( argc != 1 )
+{
+show_help();
+return rc;
+}
+
+domid = strtol(argv[0], NULL, 10);
+query.dom = domid;
+rc = xc_gnttab_op(xch, GNTTABOP_query_size, &query, sizeof(query), 1);
+
+if ( rc == 0 && (query.status == GNTST_okay) )
+printf("domid=%d: nr_frames=%d, max_nr_frames=%d\n",
+   query.dom, query.nr_frames, query.max_nr_frames);
+
+return rc == 0 && (query.status == GNTST_okay) ? 0 : 1;
+}
+
+struct {
+const char *name;
+int (*function)(int argc, char *argv[]);
+} main_options[] = {
+{ "help", help_func },
+{ "gnttab_query_size", gnttab_query_size_func},
+};
+
+int main(int argc, char *argv[])
+{
+int ret, i;
+
+/*
+ * Set stdout to be unbuffered to avoid having to fflush when
+ * printing without a newline.
+ */
+setvbuf(stdout, NULL, _IONBF, 0);
+
+if ( argc <= 1 )
+{
+show_help();
+return 0;
+}
+
+for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
+if ( !strncmp(main_options[i].name, argv[1], strlen(argv[1])) )
+break;
+
+if ( i == ARRAY_SIZE(main_options) )
+{
+show_help();
+return 0;
+}
+else
+{
+xch = xc_interface_open(0, 0, 0);
+if ( !xch )
+{
+fprintf(stderr, "failed to get the handler\n");
+return 0;
+}
+
+ret = main_options[i].function(argc - 2, argv + 2);
+
+xc_interface_close(xch);
+}
+
+/*
+ * Exitcode 0 for success.
+ * Exitcode 1 for an error.
+ * Exitcode 2 if the operation should be retried for any reason (e.g. a
+ * timeout or because another operation was in progress).
+ */
+
+#define EXIT_TIMEOUT (EXIT_FAILURE + 1)
+
+BUILD_BUG_ON(EXIT_SUCCESS != 0);
+BUILD_BUG_ON(EXIT_FAILURE != 1);
+BUILD_BUG_ON(EXIT_TIMEOUT != 2);
+
+switch ( ret )
+{
+case 0:
+return EXIT_SUCCESS;
+case EAGAIN:
+case EBUSY:
+return EXIT_TIMEOUT;
+default:
+return EXIT_FAILURE;
+}
+}


Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-01-12 Thread Valentin Vidic
On Fri, Jan 12, 2018 at 01:34:10AM +0100, Hans van Kranenburg wrote:
> Is the 59 your lots-o-vcpu-monster?

Yes, that is the one with a larger vcpu count.

> I just finished with the initial preparation of a Xen 4.10 package for
> unstable and have it running in my test environment.

Unrelated to this issue, but can you tell me if there is a way to
mitigate Meltdown with the Xen 4.8 dom0/domU(PV) running stretch?

> Since this has been reported multiple times already, and upstream has
> bumped it to 64, my verdict would be:
> 
> * Bump default to 64 already like upstream did in a later version.
> * Properly document this issue in NEWS.Debian and also mention the
> option with documentation in the template grub config file, so there's a
> bigger chance users who run unusual big numbers of disks/nics/cpus/etc
> will find it.
> 
> ...so we also better accomodate users who are using newer kernels in the
> domU with blk-mq, and prevent them from wasting too much time and
> getting frustrated for no reason.
> 
> I wouldn't be comfortable with bumping it above the current latest
> greatest upstream default, since it would mean we would need to keep a
> patch in later versions.
> 
> I'll prepare a patch to bump the default to 64 in 4.8, taking changes
> from the upstream patch. I probably have to ask upstream (Juergen Gross)
> why the commit that was referenced earlier bumps the default without
> mentioning it in the commit message.

Thanks, 64 should be a good start.  If there are still problems
reported with that it can be reconsidered.

-- 
Valentin



Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-01-08 Thread Valentin Vidic
On Sun, Jan 07, 2018 at 07:36:40PM +0100, Hans van Kranenburg wrote:
> Recently a tool was added to "dump guest grant table info". You could
> see if it compiles on the 4.8 source and see if it works? Would be
> interesting to get some idea about how high or low these numbers are in
> different scenarios. I mean, I'm using 128, you 256, and we even don't
> know if the actual value is maybe just above 32? :]
> 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=df36d82e3fc91bee2ff1681fd438c815fa324b6a

The diag tool does not build inside xen-4.8:

xen-diag.c: In function ‘gnttab_query_size_func’:
xen-diag.c:50:10: error: implicit declaration of function 
‘xc_gnttab_query_size’ [-Werror=implicit-function-declaration]
 rc = xc_gnttab_query_size(xch, &query);
  ^~~~

but I think the same info is available in the thread on xen-devel:

  https://www.mail-archive.com/xen-devel@lists.xen.org/msg116910.html

When the domU hangs crash reports nr_grant_frames=32. After increasing
the gnttab_max_frames=256 the domU reports using nr_grant_frames=59.

So the new default of gnttab_max_frames=64 might be a bit close to 59,
but I suppose 128 would be just as safe as 256 I currently use (if
you prefer 128).

> If this is something users are going to run into while not doing more
> unusual things like having dozens of vcpus or network interfaces, then
> changing the default could prevent hours of frustration and debugging
> for them.

Yes, the failure case is quite nasty, as the domU just hangs without
even suggesting grant frames might be the problem. Not sure if domU
can detect this situation at all?

Anyway, if the value cannot be increased, the situation should at least
be mentioned in the NEWS.Debian of the xen package.

-- 
Valentin



Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-01-07 Thread Valentin Vidic
On Sat, Jan 06, 2018 at 11:17:00PM +0100, Hans van Kranenburg wrote:
> I agree that the upstream default, 32 is quite low. This is indeed a
> configuration issue. I myself ran into this years ago with a growing
> number of domUs and network interfaces in use. We have been using
> gnttab_max_nr_frames=128 for a long time already instead.
> 
> I was tempted to reassign src:xen, but in the meantime, this option has
> already been removed again, so this bug does not apply to unstable
> (well, as soon as we get something new in there) any more (as far as I
> can see quickly now).
> 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=18b1be5e324bcbe2f10898b116db641d404b3d30

It does not seem to be removed but increased the default from 32 to 64?

> Including a better default for gnttab_max_nr_frames in the grub config
> in the debian xen package in stable sounds reasonable from a best
> practices point of view.
> 
> But, I would be interested in learning more about the relation with
> block mq although. Does using newer linux kernels (like from
> stretch-backports) for the domU always put a bigger strain on this? Or,
> is it just related to the overall number of network devices and block
> devices you are adding to your domUs in your specific own situation, and
> did you just trip over the default limit?

After upgrading the domU and dom0 from jessie to stretch on a big postgresql
database server (50 VCPUs, 200GB RAM) it starting freezing very soon
after boot as posted there here:

  https://lists.xen.org/archives/html/xen-users/2017-07/msg00057.html

It did not have these problems while running jessie versions of the
hypervisor and the kernels.  The problem seems to be related to the
number of CPUs used, as smaller domUs with a few VCPUs did not hang
like this.  Could it be that large number of VCPUs -> more queues in
Xen mq driver -> faster exhaustion of allocated pages?

-- 
Valentin



Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2018-01-06 Thread Valentin Vidic
On Sat, Jan 06, 2018 at 03:08:26PM +0100, Yves-Alexis Perez wrote:
> According to that link, the fix seems to be configuration rather than code.
> Does this mean this bug against the kernel should be closed?

Yes, the problem seems to be in the Xen hypervisor and not the Linux
kernel itself.  The default value for the gnttab_max_frames parameter
needs to be increased to avoid domU disk IO hangs, for example:

  GRUB_CMDLINE_XEN="dom0_mem=10240M gnttab_max_frames=256"

So either close the bug or reassign it to xen-hypervisor package so
they can increase the default value for this parameter in the
hypervisor code.

-- 
Valentin



Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

2017-11-16 Thread Valentin Vidic
Hi,

The problem seems to be caused by the new multi-queue xen blk driver
and I was advised by the Xen devs to increase the gnttab_max_frames=256
parameter for the hypervisor.  This has solved the blocking issue
for me and it has been running without problems for a few months now.

I/O to LUNs hang / stall under high load when using xen-blkfront
https://www.novell.com/support/kb/doc.php?id=7018590

-- 
Valentin



Bug#753235: closing 753235

2017-09-03 Thread Valentin Vidic
close 753235 
thanks



Bug#869986: [Debian-ha-maintainers] Bug#869986: Bug#869986: pacemaker FTBFS: missing symbols

2017-08-07 Thread Valentin Vidic
On Mon, Aug 07, 2017 at 02:31:57PM -0400, Ferenc Wágner wrote:
> There's no problem with the Pacemaker libs, the "missing" symbols are a
> manifestation of the binutils incompatibility in the libqb headers.

Ok, didn't realize the pacemaker FTBFS was caused by the libqb problem.
Even better, than we only have one nasty bug to squash :)

-- 
Valentin



Bug#869986: [Debian-ha-maintainers] Bug#869986: Bug#869986: pacemaker FTBFS: missing symbols

2017-08-07 Thread Valentin Vidic
On Mon, Aug 07, 2017 at 09:31:22AM -0400, Ferenc Wágner wrote:
> Absolutely, thanks for this very good find, Valentin!  These symbols
> caused problems on non-x86 architectures before, and now libqb is broken
> for good (so we should probably merge this into #871153).  Let's see
> what upstream comes up with.  Till now i couldn't wrap my head around
> the orphan section linker magic, now their struggle might shed some
> light on the point of all this...

Right, the upstream is having problems with libqb, but maybe they don't
see the problem with pacemaker libs if they are not checking the
exported symbols.  Do you know if these start/stop symbols were used
anywhere or it would be safe to drop them from the pacemaker libs?

#MISSING: 1.1.17-1# (arch=!powerpc !powerpcspe !ppc64 
!ppc64el)__start___verbose@Base 1.1.12  
 
#MISSING: 1.1.17-1# (arch=!powerpc !powerpcspe !ppc64 
!ppc64el)__stop___verbose@Base 1.1.12

-- 
Valentin



Bug#869986: [Debian-ha-maintainers] Bug#869986: pacemaker FTBFS: missing symbols

2017-08-03 Thread Valentin Vidic
On Fri, Jul 28, 2017 at 04:14:47PM +0300, Adrian Bunk wrote:
> Source: pacemaker
> Version: 1.1.17-1
> Severity: serious
> 
> Some recent change in unstable makes pacemaker FTBFS:
> 
> https://tests.reproducible-builds.org/debian/history/pacemaker.html
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/pacemaker.html

Seems to be related to binutils 2.29 problem reported here:

  https://bugzilla.redhat.com/show_bug.cgi?id=1477354

-- 
Valentin



Bug#857368: [Debian-ha-maintainers] Bug#857368: heartbeat: Heartbeat-Package ist missing package "net-tools" as depency due the need of command "ifconfig".

2017-03-11 Thread Valentin Vidic
On Fri, Mar 10, 2017 at 04:31:33PM +0100, Ronny Schneider wrote:
> as i installed the heartbeat-package in Debian Stretch, heartbeat failed to
> set the ip adresses, since the command "ifconfig" cannot be run. This command
> is part of the "net-tools"-Package and thus needed until heartbeat is patched
> to use the new "ip" command instead. Until this isn't done, the package
> "net-tools" needs to be referenced as a depency.

I can reproduce the problem, but the error is coming from resource-agents
package (/usr/lib/ocf/resource.d/heartbeat/IPaddr) so I will reassign there. 

Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: info: Acquiring resource 
group: sid1 192.168.122.200
Mar 11 16:32:04 sid1 
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.122.200)[17519]: INFO: 
 Resource is stopped
Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: info: Running 
/etc/ha.d//resource.d/IPaddr 192.168.122.200 start
Mar 11 16:32:04 sid1 IPaddr(IPaddr_192.168.122.200)[17573]: ERROR: Setup 
problem: couldn't find command: ifconfig
Mar 11 16:32:04 sid1 
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.122.200)[17561]: 
ERROR:  Program is not installed
Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: ERROR: Return code 5 from 
/etc/ha.d//resource.d/IPaddr
Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: CRIT: Giving up resources 
due to failure of 192.168.122.200
Mar 11 16:32:04 sid1 ResourceManager(default)[17494]: info: Releasing resource 
group: sid1 192.168.122.200

-- 
Valentin



Bug#818961: [Debian-ha-maintainers] Bug#818961: Freeze status, Heartbeat plans

2016-12-21 Thread Valentin Vidic
On Wed, Dec 21, 2016 at 03:32:39PM +0100, Valentin Vidic wrote:
> node1 IPaddr2::192.168.122.101/24/ens3 drbddisk::drbd0 LVM::cluster 
> Filesystem::/dev/cluster/srv::/srv::ext4 mysql 
> apache::/etc/apache2/apache2.conf

Also found the following problem in resource-agents but
again not related to systemd :)

  https://github.com/ClusterLabs/resource-agents/pull/905

-- 
Valentin



Bug#818961: [Debian-ha-maintainers] Freeze status, Heartbeat plans

2016-12-21 Thread Valentin Vidic
On Wed, Dec 21, 2016 at 02:24:00PM +0100, Patrick Matthäi wrote:
> DRBD+lvm+ext4, apache and mariadb should be enough

IMHO this setup is too complex for v1, but even that
seem to work for me:

node1 IPaddr2::192.168.122.101/24/ens3 drbddisk::drbd0 LVM::cluster 
Filesystem::/dev/cluster/srv::/srv::ext4 mysql apache::/etc/apache2/apache2.conf

The only problem I found is during reboot drbd service
is not started and I can't enable it but this might be
an issue with drbd-utils package:

# systemctl enable drbd
drbd.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable drbd
update-rc.d: error: drbd Default-Start contains no runlevels, aborting.

-- 
Valentin



Bug#818961: [Debian-ha-maintainers] Freeze status, Heartbeat plans

2016-12-21 Thread Valentin Vidic
On Wed, Dec 21, 2016 at 01:15:12PM +0100, Patrick Matthäi wrote:
> We tried out many test cases, workarounds, debugging on this issue and
> the result was, that the v1 mode of heartbeat can not deal with the
> dependency system of systemd and will never support it.

Not sure what could be the problem with systemd dependencies.
Do you remember which services were involved so we can do a
quick test?

-- 
Valentin



Bug#796638: [Debian-ha-maintainers] Bug#796638: Patches

2016-07-07 Thread Valentin Vidic
On Wed, Jul 06, 2016 at 04:53:05PM +0200, Christian Hofstaedtler wrote:
> Yes, but that is still better than not working at all.

Ok, I'll try to create a service file for the o2cb cluster stack
that is not too complicated.

The pacemaker cluster stack does not need the service files as
it starts controld and ocfs2 filesystem from resource scripts.

-- 
Valentin



Bug#823583: gfs2-utils: FTBFS: libgfs2 unit tests [..] FAILED (libgfs2.at:4)

2016-05-10 Thread Valentin Vidic
On Fri, May 06, 2016 at 10:21:33AM +0100, Chris Lamb wrote:
>   libgfs2 unit tests
>   
>24: meta.c  FAILED (libgfs2.at:4)
>25: rgrp.c  ok

The build test fails for me on unstable too due to a 2 byte increase in
a structure size.  Was there some recent compiler or lib change in unstable
that would have caused this?


## - ##
## Test results. ##
## - ##

ERROR: All 25 tests were run,
1 failed unexpectedly.

##  ##
## Summary of the failures. ##
##  ##
Failed tests:
gfs2-utils master test suite test groups:

 NUM: FILE-NAME:LINE TEST-GROUP-NAME
  KEYWORDS

  24: libgfs2.at:3   meta.c
  libgfs2

## -- ##
## Detailed failed tests. ##
## -- ##

# -*- compilation -*-
24. libgfs2.at:3: testing meta.c ...
./libgfs2.at:3: test x"$ENABLE_UNIT_TESTS" = "xyes" || exit 77
./libgfs2.at:4: check_meta
stderr:
gfs2_dirent: __pad: offset is 28, expected 26
gfs2_dirent: size mismatch between struct 40 and fields 38
stdout:
Running suite(s): libgfs2
0%: Checks: 1, Failures: 1, Errors: 0
check_meta.c:9:F:Meta:test_lgfs2_meta:0: Assertion 'lgfs2_selfcheck() == 0' 
failed
./libgfs2.at:4: exit code was 1, expected 0
24. libgfs2.at:3: 24. meta.c (libgfs2.at:3): FAILED (libgfs2.at:4)

-- 
Valentin



Bug#711628: libhttp-daemon-ssl-perl: FTBFS: test failure

2015-09-29 Thread Valentin Vidic
On Sat, Jun 08, 2013 at 01:02:56PM +0100, Dominic Hargreaves wrote:
> Source: libhttp-daemon-ssl-perl
> Version: 1.04-3
> Severity: serious
> Justification: FTBFS
> 
> This package FTBFS (in a clean sid sbuild session):
> 
> Can't call method "get_request" on an undefined value at t/testmodule.t line 
> 90.
> t/testmodule.t .. 
> Dubious, test returned 255 (wstat 65280, 0xff00)
> Failed 1/2 test programs. 2/10 subtests failed.
> Failed 4/9 subtests 

The package seems to work, but the test case needs some
tweaks to work with more recent SSL libs (patch attached).

-- 
Valentin
--- t/testmodule.t	2008-02-12 02:27:01.0 +0100
+++ t/testmodule2.t	2015-09-29 14:07:23.792135915 +0200
@@ -36,6 +36,7 @@
 
 $client = new IO::Socket::SSL(PeerAddr => $SSL_SERVER_ADDR,
   PeerPort => $SSL_SERVER_PORT,
+  SSL_version => 'TLSv1',
   SSL_verify_mode => 0x01,
   SSL_ca_file => "certs/test-ca.pem");
 
@@ -58,7 +59,7 @@
    Timeout => 30,
    ReuseAddr => 1,
    SSL_verify_mode => 0x00,
-   SSL_ca_file => "certs/test-ca.pem",
+   SSL_key_file => "certs/server-key.pem",
    SSL_cert_file => "certs/server-cert.pem");
 
 if (!$server) {


Bug#698118: asterisk 1:1.6.2.9-2+squeeze9 segfaults

2013-01-14 Thread Valentin Vidic
Same thing here, started segfaulting after an upgrade this morning:

2013-01-14 10:32:13 upgrade asterisk 1:1.6.2.9-2+squeeze8 1:1.6.2.9-2+squeeze9

[783312.661049] asterisk[27654]: segfault at 1 ip b748db77 sp b5319684 error 4 
in libc-2.11.3.so[b7418000+14]
[783442.211589] asterisk[13070]: segfault at 1 ip b74a0b77 sp b532d684 error 4 
in libc-2.11.3.so[b742b000+14]
[787731.493578] asterisk[13304]: segfault at 1 ip b74b6b77 sp b5344684 error 4 
in libc-2.11.3.so[b7441000+14]
[787933.505841] asterisk[937]:   segfault at 1 ip b750fb77 sp b539c684 error 4 
in libc-2.11.3.so[b749a000+14]
[788010.077989] asterisk[2168]:  segfault at 1 ip b745db77 sp b52eb684 error 4 
in libc-2.11.3.so[b73e8000+14]
[788592.836440] asterisk[2359]:  segfault at 1 ip b7550b77 sp b53dc684 error 4 
in libc-2.11.3.so[b74db000+14]
[792704.434687] asterisk[6096]:  segfault at 1 ip b746cb77 sp b52fa684 error 4 
in libc-2.11.3.so[b73f7000+14]
[793003.009440] asterisk[25102]: segfault at 1 ip b751db77 sp b53a9684 error 4 
in libc-2.11.3.so[b74a8000+14]

-- 
Valentin


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org