Re: Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x

2022-08-15 Thread Marc Haber
On Sat, Aug 13, 2022 at 10:08:26PM +0200, Paul Gevers wrote:
> On 13-08-2022 21:34, Marc Haber wrote:
> > > running atop from unstable also hangs:
> > > root@elbrus:~# atop
> > > ^C
> > 
> > on zelenka, running the atop binary just works fine. Installing atop
> > 2.7.1-2 in a DD chroot on zelenka also works fine, and the binary is ok
> > as well. However, the chroots dont start the services though.
> 
> Progress.
> 
> Now, instead of killing it, I sent it to the background and when I then take
> it to the foreground, it works as expected.

The problem is that installing the package starts atopacct, which takes
a system-wide semaphore and then stalls. atop tries to take the same
semaphore and stalls as well.

I didn't see that in the beginning because I cannot install the build
package in a dd schroot on zelenka.

I filed this upstream, https://github.com/Atoptool/atop/issues/207

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421



Re: Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x

2022-08-13 Thread Paul Gevers

Hi,

On 13-08-2022 21:34, Marc Haber wrote:

running atop from unstable also hangs:
root@elbrus:~# atop
^C


on zelenka, running the atop binary just works fine. Installing atop
2.7.1-2 in a DD chroot on zelenka also works fine, and the binary is ok
as well. However, the chroots dont start the services though.


Progress.

Now, instead of killing it, I sent it to the background and when I then 
take it to the foreground, it works as expected.


root@ci-worker-s390x-01:~# atop
^Z
[1]+  Stopped atop
root@ci-worker-s390x-01:~# fg
atop
root@ci-worker-s390x-01:~#


Same with your command in the test:
root@ci-worker-s390x-01:~# atop -P cpu 5 1
^Z
[1]+  Stopped atop -P cpu 5 1
root@ci-worker-s390x-01:~# fg
atop -P cpu 5 1
RESET
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 0 
12314475 57940088 197207 116525509 1229493 133423 982033 4278583 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 1 
13096470 56792358 204646 118023945 1290960 133142 321874 3737087 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 2 
12982530 56925413 209005 117993872 1288573 131703 322564 3746751 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 3 
13465982 56697100 208873 117747350 1287548 131114 322660 3739777 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 4 
13639265 56795653 213211 117476209 1276394 130964 321365 3747339 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 5 
13326756 56460169 202500 118173964 1261805 129906 322232 3723116 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 6 
12968736 56176871 207863 118788707 1265701 130806 329336 3732416 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 7 
13026985 56068710 211225 118856524 1248204 130943 321583 3736213 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 8 
14194105 56997563 204065 116748001 1264309 130682 320834 3740854 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 9 
13285438 56060337 205755 118583081 1279057 130206 323123 3733407 0 0 100 0 0

SEP

Anybody any clue?

Paul


OpenPGP_signature
Description: OpenPGP digital signature


Re: Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x

2022-08-12 Thread Paul Gevers

Hi,

[tl;dr: atop seems to hang on s390x]

On 12-08-2022 12:23, Marc Haber wrote:

On Thu, Aug 11, 2022 at 10:51:32PM +0200, Paul Gevers wrote:

On 10-08-2022 12:03, Marc Haber wrote:

Unfortunately, this bug report suffers from multiple cut or
template error. The ci link points to the mercurial page for amd64, the
text alternates between s390s, armhf, arm64 and amd64.


There was only one that I'm aware of, the link to mercurial. But I
understand it if the text was a bit confusing.


You said autopkgtest fails on amd64, which was never the case. Maybe
amd64 and arm64 got confused.


What I *wanted* to convey is that arm64 and amd64 *failures* are in our 
RC policy and all other *regressions* are RC too. I did mix that up.



I tried the (dead simple)d autopkgtest on the s390s and arm64 porterboxes
and it succeeded in a second's time. I have sharpened the expression
that counts the CPUs in lscpu's output and hope this will fix the issue.


ooo, CPU count. Yes, some of those archs run on hosts with lots of CPU's.
armhf has 160, s390x has 10.


I am testing locally on amd64 with a machine with 12 CPUs. The armhf
tests succeed (see
https://ci.debian.net/data/autopkgtest/testing/armhf/a/atop/24578667/log.gz).


Great, same on arm64. s390x still times out though.


The complete test is:
#!/bin/bash

# atop reports number of CPU and two extra lines
ATOPSOPINION="$(atop -P cpu 5 1 | grep -vE '^(RESET|SEP)' | wc -l)"


When I run `atop` manually (on stable), it doesn't do anything...
root@ci-worker-s390x-01:~# atop
^C

I started up a clean unstable lxc container and installing atop takes 
quite some time between:
Created symlink 
/etc/systemd/system/timers.target.wants/atop-rotate.timer -> 
/lib/systemd/system/atop-rotate.timer.
Created symlink /etc/systemd/system/multi-user.target.wants/atop.service 
-> /lib/systemd/system/atop.service.
Created symlink 
/etc/systemd/system/multi-user.target.wants/atopacct.service -> 
/lib/systemd/system/atopacct.service.

and
Could not execute systemctl:  at /usr/bin/deb-systemd-invoke line 145.

running atop from unstable also hangs:
root@elbrus:~# atop
^C


There is no loop, and nothing that could fail on a big number. In my
understanding, this could run on a box with 2000 cores and still work.


Except, it doesn't. Seems like atop is seriously broken on s390x on the 
hosts that we have.



Also, the test does not time out on zelenka when manually invoked in an
schroot (setting PATH to point to an executable atop is necessary, as it
does not seem to be possible to install an abitrary package that is not
in the archive. Also, the test is successful if invoked after installing
atop 2.7.1-2 from the archive.


Maybe we need to involve the s390x porters? I put them in CC to already 
draw their attention.


Paul


OpenPGP_signature
Description: OpenPGP digital signature