Re: Bug#1076309: [s390x] lots of "User process fault: interruption code XXXX"
Hi waldi, On 24-07-2024 10:57 a.m., Bastian Blank wrote: On Thu, Jul 18, 2024 at 08:06:46AM +0200, Paul Gevers wrote: However, that doesn't seem to work on our s390x host as it seems to freeze instead. Is this something known? Something I'm doing wrong (E.g. these options behaving differently on s390x)? Is this a s390x kernel bug? This now points to a kernel bug. Which requires a new kernel first for further debugging. What does "freeze" mean"? Also no sysrq? With freeze I mean I have no connection anymore. And when I reboot, there's nothing in the journal since I lost connection. I don't know yet what sysrq means, so I'll look into that. See https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#sysrq, but you need to enable that before. Maybe tomorrow. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [s390x] lots of "User process fault: interruption code XXXX"
Hi all, On Sun, 14 Jul 2024 09:22:32 +0200 Paul Gevers wrote: Today I restarted the s390x host of ci.d.n because I lost access. I have been fighting with the host for several days now, and I think I finally found the culprit. Several days ago I configured the host to do: # panic kernel on OOM vm.panic_on_oom=1 # reboot after 10 sec on panic kernel.panic=10 An idea I got from h01ger last year when discussing issues on arm64 and that has been working well there [1]. (See also https://www.debuntu.org/how-to-reboot-on-oom/) However, that doesn't seem to work on our s390x host as it seems to freeze instead. Is this something known? Something I'm doing wrong (E.g. these options behaving differently on s390x)? Is this a s390x kernel bug? Paul PS: the package that triggers this is hisat2. If you look at it's history [2] you see that the test was always Terminated (ignoring the run from 2024-07-17), I now assume by the OOM killer. I filed bug 1076524 against hisat2 to tell them they are using an insane amount of memory on s390x. [1] See e.g. the period around February/March 2024 on https://ci.debian.net/munin/ci-worker-arm64-11/ci-worker-arm64-11/uptime.html where a lot of reboots happened automatically. [2] https://ci.debian.net/packages/h/hisat2/testing/s390x/ OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068509: liferea: autopkgtest on s390x suggests liferea is partially broken there
Package: liferea Version: 1.15.6-1 X-Debbugs-CC: debian-s390@lists.debian.org Severity: normal User: debian-s390@lists.debian.org Usertag: s390x Hi, Today I was debugging the failing autopkgtest of liferea, a graphical rss reader, on s390x [1]. Previously I already worked around another crash that happens on s390x only. Although I wonder how useful a desktop application like liferea is on such an architecture, I tried to get it to work. Luckily dogtail (the framework I use to drive the test) has screenshot functionality and I noticed that after requesting the help or the FAQ, there is no text in the window that on other architectures shows the content of the help/FAQ. Once the tests for 1.15.6-2 or higher run on ci.debian.net, the screenshots will be part of the artifacts tar that can be downloaded from the web page. I'm suspecting that something in the lower layers is broken, but I have absolutely no evidence. Filing this bug just for visibility. For avoidance of doubt, I consider here 3 things (which I suspect are related): 1) the crash in the "liferea.dump()" call is highly suspicious 2) the button for the Help should be the same across architectures 3) the screenshots for the help and FAQ should show the content Paul [1] https://ci.debian.net/packages/l/liferea/testing/s390x/ OpenPGP_signature.asc Description: OpenPGP digital signature
bug 1064423: src:gnome-keyring: FTBFS on s390x
Hi all, [s390x porters in CC, maybe they can help?] On Wed, 21 Feb 2024 22:22:54 +0100 Paul Gevers wrote: Source: gnome-keyring The Release Team considers packages that are out-of-sync between testing and unstable for more than 30 days as having a Release Critical bug in testing [1]. Your package src:gnome-keyring has been trying to migrate for 32 days [2]. Hence, I am filing this bug. The version in unstable (and version 42.2) failed to build on s390x. By now three uploads in a row of gnome-keyring FTBFS on s390x on Debian buildds (but at least the last ones built on Ubuntu's infrastructure). This is a key package [3], so autoremoval doesn't happen and most likely (I haven't checked) removal of the binary on s390x isn't going to be trivial (otherwise it wouldn't be key). Hence, the issue needs to be solved for gnome-keyring to migrate. Is there any progress on this? Paul [1] https://lists.debian.org/debian-devel-announce/2023/06/msg1.html [2] https://qa.debian.org/excuses.php?package=gnome-keyring [3] https://release.debian.org/key-packages.html OpenPGP_signature.asc Description: OpenPGP digital signature
s390x qemu support in autopkgtest?
Dear s390x porters, Recently I added isolation-machine [1] support to ci.debian.net on amd64. At this moment, I think performance wise the only architecture that qualifies in our infrastructure to add it too is s390x. However, autopkgtest-build-qemu doesn't know what to do yet. Would any of you have interest to improve autopkgtest-build-qemu to support building a qemu image on s390x? It currently says this: root@ci-worker-s390x-01:~# autopkgtest-build-qemu unstable unstable-s390x Unable to guess an appropriate boot protocol, use --boot to specify I note that ppc64el support was added in merge request 91 [2]. Obviously we'd also need to be able run these VM's via autopkgtest-virt-qemu, so if details are missing there (merge request related to ppc64el are linked from MR91). Please note that this is the first time that I'm really working with qemu, so I'm learning as I go. My qemu experience is now several days. Paul [1] https://salsa.debian.org/ci-team/autopkgtest/-/blob/master/doc/README.package-tests.rst [2] https://salsa.debian.org/ci-team/autopkgtest/-/merge_requests/91 OpenPGP_signature.asc Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
Hi Elizabeth, On 18-04-2023 22:46, Elizabeth K. Joseph wrote: I noticed that the Munin graphs are showing that the queue problems from earlier this year seem to have been reduced now, is that correct, or has the VM just not been restarted lately? It would be helpful to have a starting point. I was suspecting that after the last communications there were some changes on your side, because I rebooted the VM maybe two or three times [1] and noticed not seeing the slow down as I was used to. I also had the impression that some of my other pain point improved a bit. At this moment Debian is in freeze [2] in preparation for the release of Debian bookworm, hence it's rather dull on the CI side. That makes interpreting the Munin graphs a bit difficult. In the mean time I have also upgraded the VM to run Debian bookworm (around 2023-03-14), maybe that also make a difference. I propose we wait with further investigations until after the bookworm release. Once our CI hosts are loaded normally again I think we're in a better position to judge real performance. Having said that, I'm happy to schedule and/or arrange access to an LXC container on our host for investigations already before that time if you want to poke around. Paul [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/uptime.html [2] https://release.debian.org/testing/freeze_policy.html#summary OpenPGP_signature Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
Hi, On 28-02-2023 01:39, James Addison wrote: Attempting to sum together what look, to me, like a pair of 2s: * The s390x Debian CI queue size[1] is growing again. Yes, but this time it's because some test seems to be misbehaving (only on s390x or big endian or... ) and fills the disk (and gets killed by a cron job and restarts). I'm suspecting dolfin (and python-oslo.db) at the moment. * A recent bug report[2] by Dipak describes userspace processes getting stuck on an s390 Linux kernel version that Debian's CI infra has been using We reverted to the previous kernel: root@ci-worker-s390x-01:~# uname -a Linux ci-worker-s390x-01 5.10.0-20-s390x #1 SMP Debian 5.10.158-2 (2022-12-13) s390x GNU/Linux Paul OpenPGP_signature Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
Hi, On 21-02-2023 17:46, Dipak Zope1 wrote: I am wondering whether we have downgraded the machines to 5.10.0-20 kernel to get rid of the kernel bug. I think I mentioned it before, we downgraded indeed: root@ci-worker-s390x-01:~# uname -a Linux ci-worker-s390x-01 5.10.0-20-s390x #1 SMP Debian 5.10.158-2 (2022-12-13) s390x GNU/Linux Can we check if the patch solves any of the CI issues? If there's a package available somewhere, I can install it, but I currently don't have the time (nor the will, sorry) to learn how to build Debian s390x kernel packages. Paul OpenPGP_signature Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
Hi, On 13-02-2023 15:59, Dipak Zope1 wrote: There is some issue with 5.10.0-21 kernel and we are working on it. This can cause performance impact on CI servers. I have rebooted to the old kernel yesterday. That helps a bit indeed, although most of the issues I reported predate that kernel upgrade. As Paul mentioned we have upgraded CI servers to better capacity in May last year. Is today’s performance worse than what we observed right after upgrade? As you can see e.g. here [1,2] it comes and goes (albeit sometimes the queue was empty). I don't think its very different, I just never got out of the s390x host what I was expecting. Long time I blamed it on the "stealing" that happens on a shared host, but I think there's more. [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/debci_packages_processed.html [2] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/cpu.html Is the performance deteriorated consistently over period of time or suddenly observed? Is there any incidence – like change/upgrade in software or hardware component which is coinciding with it if it is sudden change? James Addison suggested in [3] to increase a prefetch counter in amqp (although its the same on all hosts); I have done so on the s390x host and at least initially it seems to help keeping the host busier. [3] https://salsa.debian.org/ci-team/debci/-/issues/92#note_381306 Paul OpenPGP_signature Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
Hi Phil, On 13-02-2023 08:57, Philipp Kern wrote: On 12.02.23 22:38, Paul Gevers wrote: I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". This is more of a software development / devops question than a sysadmin question, but alas. I acknowledge that my reach out was broad and didn't only cover s390x. What I am interested in is *application-level* logging on reconnects. Presumably the connection to RabbitMQ is outbound? Our configuration can be seen here: https://salsa.debian.org/ci-team/debian-ci-config/-/blob/master/cookbooks/rabbitmq/templates/rabbitmq.conf.erb Is it tunneled? Does your application log somewhere when a reconnect happens? Does it say when it successfully connected? I'd expect good software to log something like this: [10:00:00] Connecting to broker "rabbitmq.debci.debian.net:12345"... [10:00:05] Connected to broker "rabbitmq.debci.debian.net:12345". And also: [10:00:00] Connecting to broker "rabbitmq.debci.debian.net:12345"... [10:00:01] Connection to broker "rabbitmq.debci.debian.net:12345" failed: Connection refused @terceiro; I haven't seen these kind of logs on the worker hosts. Do you know if they exist or if we can generate them? I think I'm seeing something on the main host. admin@ci-master:/var/log/rabbitmq$ sudo grep 148.100.88.163 rab...@ci-master.log | grep -v '\[info\]' | grep -v '\[warning\]' 2023-02-14 00:00:37.522 [error] <0.30951.85> closing AMQP connection <0.30951.85> (148.100.88.163:49540 -> 10.1.14.198:5671): 2023-02-14 02:27:56.050 [error] <0.15184.87> closing AMQP connection <0.15184.87> (148.100.88.163:49988 -> 10.1.14.198:5671): 2023-02-14 02:36:05.496 [error] <0.17479.87> closing AMQP connection <0.17479.87> (148.100.88.163:57098 -> 10.1.14.198:5671): 2023-02-14 04:06:13.869 [error] <0.16105.88> closing AMQP connection <0.16105.88> (148.100.88.163:42984 -> 10.1.14.198:5671): 2023-02-14 04:15:27.696 [error] <0.19038.88> closing AMQP connection <0.19038.88> (148.100.88.163:56650 -> 10.1.14.198:5671): 2023-02-14 20:05:38.702 [error] <0.23586.97> closing AMQP connection <0.23586.97> (148.100.88.163:34278 -> 10.1.14.198:5671): and a lot more warnings (220 times in 20 hours) as well; like: 2023-02-14 20:05:09.011 [warning] <0.20860.97> closing AMQP connection <0.20860.97> (148.100.88.163:45624 -> 10.1.14.198:5671, vhost: '/', user: 'guest'): And a lot (around 544) (obviously I don't know if that's only or even includes the s390x host): client unexpectedly closed TCP connection Paul OpenPGP_signature Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
Hi Phil and all others offering help, On 12-02-2023 20:32, Philipp Kern wrote: On 11.02.23 18:18, Paul Gevers wrote: * [suspect 1] network issues between the s390x and the main ci.d.n server (the results (log files) of the autopkgtests are transferred to the main server). Our ppc64el hosts are also located at Marist, so I would expect commonality here, but also ppc64el isn't performing great, so maybe part of the problem is common. Do you have any kind of statistics on the network connections? I.e. how often it reconnects and how long it takes to reconnect? The Marist network has a very weird firewall inbound (e.g. if I do too many SSH requests in a row, I'm backholed) - so I would not be surprised if there is some weirdness there. I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". Also, I have no experience with s390x except for deploying the Debian software on the server setup by Phil. All the quirks of s390x are beyond me. I can provide logging from the host, but I'll need detailed instructions of what people find useful to look at. Recently Antonio taught me a trick to provide temporary access to a lxc container on any of our hosts, so if it helps to be on the host (but inside lxc) we can provide for that. Paul [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/index.html OpenPGP_signature Description: OpenPGP digital signature
help needed to manage s390x host for ci.debian.net
Dear all, This is a call for help, mostly for s390x specifically and debugging our s390x host, ci.debian.net is the infrastructure that enables the Debian Release Team to run autopkgtest as part of the quality assurance for unstable-to-testing migration. Historically, that used to be exclusively amd64, but the last two years that has been extended with arm64, armhf, armel, i386, ppc64el and s390x. We have one s390x VM (generously provided by IBM) hosted at Marist College, running 10 debci workers in parallel. Now there are a couple of things that make me reach out, because I am coming to the conclusion that I can't handle it myself. I have the impression that the s390x host isn't delivering what it's capable of. Instead of asking for more resources (which I believe would be granted), I believe we should first try to see if there's not items we can fix on the Debian side. There's a couple of observations and ideas that I have. * [observation] contrary to other architecture the queue (rabbitmq) for s390x doesn't empty anymore. Even if there are no package processing and the database doesn't know about pending jobs, there are typically (over time increasing) tests left in the queue. * [observation] with jobs in the queue, the amount of packages being processed [1] often isn't equal to the amount of debci workers, meaning they are idle/waiting. Compare that e.g. to our amd64 host #13 [2] where if there's a queue, the number of processed packages is flat at the number of debci workers * [observation] we checked the average time a test runs on s390x and it doesn't deviate much from the other architectures, and it's for sure not the worse. However, the amount of tests processed per day per debci worker is the lowest of all architectures [3], easily half of i386 which has 11 workers vs 10 on s390x. * [observation] in general, I believe that we could setup our hosts to use tmpfs for the testbed, because I really believe that installing the packages for the test is taking a considerable amount of time and for short test, mean everything. Most perl tests only take seconds, while preparing the testbed are multiple 10s of seconds. If we would trade disk for memory, I think we could get much more out of any host with the same amount of CPU. * [suspect 1] network issues between the s390x and the main ci.d.n server (the results (log files) of the autopkgtests are transferred to the main server). Our ppc64el hosts are also located at Marist, so I would expect commonality here, but also ppc64el isn't performing great, so maybe part of the problem is common. * [idea] maybe the queuing and back reporting in debci could be improved to transfer the logs separately from the result, such that the transfer doesn't block the workers from picking up new tasks. What drove me here is that several days ago I updated the VM (it runs bullseye) and as part of that a new kernel was installed. After the reboot, it looks like mariadb doesn't want to install anymore in the testsbeds (the test hang and timeout). See e.g. https://ci.debian.net/packages/d/dbconfig-common/testing/s390x/ There is a FTBFS in mariadb also related to s390x: #1030510 [4] where it was hinted to be due to a bad kernel. On top of that, I have observed very often that after a reboot of the host the amount packages process in the first few days is considerably lower (1/10 or so) than normally. I'm not seeing anything deviating on the system, except that. I have no ideas anymore where to look. Personally I don't really care about s390x and it's costing me more time than the other architectures. I don't want to anymore. Paul PS: I may have forgotten some observations and ideas, but the message is long enough already and I wanted to send it. [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/debci_packages_being_processed.html [2] https://ci.debian.net/munin/ci-worker13/ci-worker13/debci_packages_being_processed.html [3] https://ci.debian.net/munin/debian.net/ci-master.debian.net/debci_total_packages_processed.html [4] https://bugs.debian.org/1030510 $ rake capacity amd64 17 arm64 20 armel 12 armhf 12 i38611 ppc64el 8 riscv64 18 s390x 10 OpenPGP_signature Description: OpenPGP digital signature
Re: RE: src:exempi: fails to migrate to testing for too long: FTBFS on s390x
Hi Dipak, On Tue, 30 Aug 2022 09:57:44 + Dipak Zope1 wrote: Apologies for late response. It looks like the issue is related to the synchronization between atop and atopacctd. I am looking into it further and will keep this thread updated. I think we established that you replied here but had the other bug in mind (in atop). I am looking forward to have a fix for this for s390x. Can you still look into the exempi issue in this bug report? On 30/08/22, 12:44 AM, "Paul Gevers" wrote: Hi Michael On 29-08-2022 14:23, Michael Biebl wrote: > As you are probably aware, this issue is known and tracked in [1]. Which I added as a blocker and mentioned in my message, so yes. > The > package FTBFS after enabling the test suite. I raised this issue > upstream but there is no real interest/motivation [2] on their part to > address these (most likely endianess related) issues. > So I informed the s390x porters as well but got not feedback so far. Ack, I saw the latter part. > To me it seems it's better to not continue ship a known broken package > on s390x and think a partial architecture removal is probably the better > alternative. If you think the package indeed is severely broken, then removal sounds best. If its broken in some less common use cases, it may be OK to leave it for now (skipping those tests on 390x) and let the porters have a look when they have time. > Let me know what you think It all depends on how broken it is. If you would consider the bugs found by the tests RC, then removal is the better choice unless a porter steps up to fix it. If the bugs would be important at most, than skipping broken tests on s390x sounds like the better option. Removal bugs are hard to time predict. Paul PS: I would not disable building on s390x if you have the test suite finding out severe problems (as the d/control file doesn't have negated architecture fields yet). Just getting the binary removed and FTBFS will prevent the architecture from building again. Otherwise I think we need to go this route. Paul OpenPGP_signature Description: OpenPGP digital signature
Re: Bug#1018224: src:exempi: fails to migrate to testing for too long: FTBFS on s390x
Hi, On 07-09-2022 21:57, Michael Biebl wrote: I fail to see the connection between exempi and atop. Have you copied the wrong bug report number? Seems like this should have gone to 1016937 indeed (doing so now). Paul Michael Am 07.09.22 um 19:00 schrieb Dipak Zope1: Hi Paul, Setting the environment variable ATOPACCT to empty value disables this issue. Please use this workaround in the caller script of atop till we get a final fix. export ATOPACCT="" The behaviour is described in the source as below: /* ** when a particular environment variable is present, atop should ** use a specific accounting-file (as defined by the environment ** variable) or should use no process accounting at all (when ** contents of environment variable is empty) */ -Dipak *From: *Dipak Zope1 *Date: *Tuesday, 30 August 2022 at 3:28 PM *To: *Paul Gevers , 1018...@bugs.debian.org <1018...@bugs.debian.org>, debian-s390 *Subject: *[EXTERNAL] RE: src:exempi: fails to migrate to testing for too long: FTBFS on s390x Apologies for late response. It looks like the issue is related to the synchronization between atop and atopacctd. I am looking into it further and will keep this thread updated. I am looking forward to have a fix for this for s390x. ZjQcmQRYFpfptBannerStart *This Message Is From an External Sender * This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Apologies for late response. It looks like the issue is related to the synchronization between atop and atopacctd. I am looking into it further and will keep this thread updated. I am looking forward to have a fix for this for s390x. OpenPGP_signature Description: OpenPGP digital signature
Re: src:exempi: fails to migrate to testing for too long: FTBFS on s390x
Hi Michael On 29-08-2022 14:23, Michael Biebl wrote: As you are probably aware, this issue is known and tracked in [1]. Which I added as a blocker and mentioned in my message, so yes. The package FTBFS after enabling the test suite. I raised this issue upstream but there is no real interest/motivation [2] on their part to address these (most likely endianess related) issues. So I informed the s390x porters as well but got not feedback so far. Ack, I saw the latter part. To me it seems it's better to not continue ship a known broken package on s390x and think a partial architecture removal is probably the better alternative. If you think the package indeed is severely broken, then removal sounds best. If its broken in some less common use cases, it may be OK to leave it for now (skipping those tests on 390x) and let the porters have a look when they have time. Let me know what you think It all depends on how broken it is. If you would consider the bugs found by the tests RC, then removal is the better choice unless a porter steps up to fix it. If the bugs would be important at most, than skipping broken tests on s390x sounds like the better option. Removal bugs are hard to time predict. Paul PS: I would not disable building on s390x if you have the test suite finding out severe problems (as the d/control file doesn't have negated architecture fields yet). Just getting the binary removed and FTBFS will prevent the architecture from building again. OpenPGP_signature Description: OpenPGP digital signature
Re: Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x
Hi, On 13-08-2022 21:34, Marc Haber wrote: running atop from unstable also hangs: root@elbrus:~# atop ^C on zelenka, running the atop binary just works fine. Installing atop 2.7.1-2 in a DD chroot on zelenka also works fine, and the binary is ok as well. However, the chroots dont start the services though. Progress. Now, instead of killing it, I sent it to the background and when I then take it to the foreground, it works as expected. root@ci-worker-s390x-01:~# atop ^Z [1]+ Stopped atop root@ci-worker-s390x-01:~# fg atop root@ci-worker-s390x-01:~# Same with your command in the test: root@ci-worker-s390x-01:~# atop -P cpu 5 1 ^Z [1]+ Stopped atop -P cpu 5 1 root@ci-worker-s390x-01:~# fg atop -P cpu 5 1 RESET cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 0 12314475 57940088 197207 116525509 1229493 133423 982033 4278583 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 1 13096470 56792358 204646 118023945 1290960 133142 321874 3737087 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 2 12982530 56925413 209005 117993872 1288573 131703 322564 3746751 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 3 13465982 56697100 208873 117747350 1287548 131114 322660 3739777 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 4 13639265 56795653 213211 117476209 1276394 130964 321365 3747339 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 5 13326756 56460169 202500 118173964 1261805 129906 322232 3723116 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 6 12968736 56176871 207863 118788707 1265701 130806 329336 3732416 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 7 13026985 56068710 211225 118856524 1248204 130943 321583 3736213 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 8 14194105 56997563 204065 116748001 1264309 130682 320834 3740854 0 0 100 0 0 cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 9 13285438 56060337 205755 118583081 1279057 130206 323123 3733407 0 0 100 0 0 SEP Anybody any clue? Paul OpenPGP_signature Description: OpenPGP digital signature
Re: Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x
Hi, [tl;dr: atop seems to hang on s390x] On 12-08-2022 12:23, Marc Haber wrote: On Thu, Aug 11, 2022 at 10:51:32PM +0200, Paul Gevers wrote: On 10-08-2022 12:03, Marc Haber wrote: Unfortunately, this bug report suffers from multiple cut&paste or template error. The ci link points to the mercurial page for amd64, the text alternates between s390s, armhf, arm64 and amd64. There was only one that I'm aware of, the link to mercurial. But I understand it if the text was a bit confusing. You said autopkgtest fails on amd64, which was never the case. Maybe amd64 and arm64 got confused. What I *wanted* to convey is that arm64 and amd64 *failures* are in our RC policy and all other *regressions* are RC too. I did mix that up. I tried the (dead simple)d autopkgtest on the s390s and arm64 porterboxes and it succeeded in a second's time. I have sharpened the expression that counts the CPUs in lscpu's output and hope this will fix the issue. ooo, CPU count. Yes, some of those archs run on hosts with lots of CPU's. armhf has 160, s390x has 10. I am testing locally on amd64 with a machine with 12 CPUs. The armhf tests succeed (see https://ci.debian.net/data/autopkgtest/testing/armhf/a/atop/24578667/log.gz). Great, same on arm64. s390x still times out though. The complete test is: #!/bin/bash # atop reports number of CPU and two extra lines ATOPSOPINION="$(atop -P cpu 5 1 | grep -vE '^(RESET|SEP)' | wc -l)" When I run `atop` manually (on stable), it doesn't do anything... root@ci-worker-s390x-01:~# atop ^C I started up a clean unstable lxc container and installing atop takes quite some time between: Created symlink /etc/systemd/system/timers.target.wants/atop-rotate.timer -> /lib/systemd/system/atop-rotate.timer. Created symlink /etc/systemd/system/multi-user.target.wants/atop.service -> /lib/systemd/system/atop.service. Created symlink /etc/systemd/system/multi-user.target.wants/atopacct.service -> /lib/systemd/system/atopacct.service. and Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145. running atop from unstable also hangs: root@elbrus:~# atop ^C There is no loop, and nothing that could fail on a big number. In my understanding, this could run on a box with 2000 cores and still work. Except, it doesn't. Seems like atop is seriously broken on s390x on the hosts that we have. Also, the test does not time out on zelenka when manually invoked in an schroot (setting PATH to point to an executable atop is necessary, as it does not seem to be possible to install an abitrary package that is not in the archive. Also, the test is successful if invoked after installing atop 2.7.1-2 from the archive. Maybe we need to involve the s390x porters? I put them in CC to already draw their attention. Paul OpenPGP_signature Description: OpenPGP digital signature
Re: src:vkd3d: fails to migrate to testing for too long: FTBFS on s390x
Dear s390x porters, As a Release Manager, I'm asking your help with the FTBFS of vkd3d on s390x. The failure is blocking migration of the key package vkd3d. Paul On Fri, 29 Apr 2022 08:54:51 +0200 Paul Gevers wrote: The Release Team considers packages that are out-of-sync between testing and unstable for more than 60 days as having a Release Critical bug in testing [1]. Your package src:vkd3d has been trying to migrate for 61 days [2]. Hence, I am filing this bug. Your package fails to build from source on s390x while it built successfully there in the past. OpenPGP_signature Description: OpenPGP digital signature
Re: qt6-declarative fails to build on s390x
Hi, On 18-04-2022 21:38, Patrick Franz wrote: this build failure is fixed in version 6.2.4+dfsg-4 of qt6-declarative (https://buildd.debian.org/status/package.php?p=qt6-declarative&suite=experimental). Awesome. Sorry for not spotting that myself. The package is currently in experimental, because it will very soon be part of the Qt 6.2.4 transition and that should solve the issue. Of course. Paul OpenPGP_signature Description: OpenPGP digital signature
qt6-declarative fails to build on s390x
Hi all, Today I stumbled upon some packages that failed to migrate to testing because they switched their build dependency to qt6-declarative. It seems to me that this package is relatively new and a replacement for something we already had for a while. Apparently that built on s390x, so this failure feels like a regression. Can somebody maybe look at why qt6-declarative fails on s390x and fix it? Paul https://buildd.debian.org/status/fetch.php?pkg=qt6-declarative&arch=s390x&ver=6.2.2%2Bdfsg-4&stamp=1644966259&raw=0 -- Build files have been written to: /<>/obj-s390x-linux-gnu make[1]: Leaving directory '/<>' dh_auto_build -a -O--buildsystem=cmake\+ninja cd obj-s390x-linux-gnu && LC_ALL=C.UTF-8 ninja -j2 -v [1/2549] cd /<>/obj-s390x-linux-gnu/src/qmltyperegistrar && /usr/bin/cmake -E cmake_autogen /<>/obj-s390x-linux-gnu/src/qmltyperegistrar/CMakeFiles/qmltyperegistrar_autogen.dir/AutogenInfo.json None && /usr/bin/cmake -E touch /<>/obj-s390x-linux-gnu/src/qmltyperegistrar/qmltyperegistrar_autogen/timestamp && /usr/bin/cmake -E cmake_transform_depfile Ninja gccdepfile /<> /<>/src/qmltyperegistrar /<>/obj-s390x-linux-gnu /<>/obj-s390x-linux-gnu/src/qmltyperegistrar /<>/obj-s390x-linux-gnu/src/qmltyperegistrar/qmltyperegistrar_autogen/deps /<>/obj-s390x-linux-gnu/CMakeFiles/d/43adb0291f7798469ad331ae4b3d28658a39659d3080b682c231b04dadff33f0.d [2/2549] /usr/bin/c++ -DQT_CORE_LIB -DQT_NO_CAST_FROM_ASCII -DQT_NO_CAST_TO_ASCII -DQT_NO_DEBUG -DQT_NO_EXCEPTIONS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT -DQT_USE_QSTRINGBUILDER -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -I/<>/obj-s390x-linux-gnu/src/qmltyperegistrar/qmltyperegistrar_autogen/include -I/<>/src/qmltyperegistrar -I/<>/obj-s390x-linux-gnu/src/qmltyperegistrar -I/<>/src/qmltyperegistrar/../qmlcompiler -isystem /usr/include/s390x-linux-gnu/qt6/QtCore -isystem /usr/include/s390x-linux-gnu/qt6 -isystem /usr/lib/s390x-linux-gnu/qt6/mkspecs/linux-g++ -isystem /usr/include/s390x-linux-gnu/qt6/QtCore/6.2.2 -isystem /usr/include/s390x-linux-gnu/qt6/QtCore/6.2.2/QtCore -g -O2 -ffile-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -Wall -Wextra -fno-exceptions -pthread -Wsuggest-override -MD -MT src/qmltyperegistrar/CMakeFiles/qmltyperegistrar.dir/qmltyperegistrar_autogen/mocs_compilation.cpp.o -MF src/qmltyperegistrar/CMakeFiles/qmltyperegistrar.dir/qmltyperegistrar_autogen/mocs_compilation.cpp.o.d -o src/qmltyperegistrar/CMakeFiles/qmltyperegistrar.dir/qmltyperegistrar_autogen/mocs_compilation.cpp.o -c /<>/obj-s390x-linux-gnu/src/qmltyperegistrar/qmltyperegistrar_autogen/mocs_compilation.cpp [3/2549] cd /<>/obj-s390x-linux-gnu/src/qml && /usr/lib/qt6/libexec/qlalr --no-debug --qt /<>/src/qml/parser/qqmljs.g [4/2549] /usr/bin/c++ -DQT_CORE_LIB -DQT_NO_CAST_FROM_ASCII -DQT_NO_CAST_TO_ASCII -DQT_NO_DEBUG -DQT_NO_EXCEPTIONS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT -DQT_USE_QSTRINGBUILDER -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -I/<>/obj-s390x-linux-gnu/src/qmltyperegistrar/qmltyperegistrar_autogen/include -I/<>/src/qmltyperegistrar -I/<>/obj-s390x-linux-gnu/src/qmltyperegistrar -I/<>/src/qmltyperegistrar/../qmlcompiler -isystem /usr/include/s390x-linux-gnu/qt6/QtCore -isystem /usr/include/s390x-linux-gnu/qt6 -isystem /usr/lib/s390x-linux-gnu/qt6/mkspecs/linux-g++ -isystem /usr/include/s390x-linux-gnu/qt6/QtCore/6.2.2 -isystem /usr/include/s390x-linux-gnu/qt6/QtCore/6.2.2/QtCore -g -O2 -ffile-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -Wall -Wextra -fno-exceptions -pthread -Wsuggest-override -MD -MT src/qmltyperegistrar/CMakeFiles/qmltyperegistrar.dir/__/qmlcompiler/qqmljsstreamwriter.cpp.o -MF src/qmltyperegistrar/CMakeFiles/qmltyperegistrar.dir/__/qmlcompiler/qqmljsstreamwriter.cpp.o.d -o src/qmltyperegistrar/CMakeFiles/qmltyperegistrar.dir/__/qmlcompiler/qqmljsstreamwriter.cpp.o -c /<>/src/qmlcompiler/qqmljsstreamwriter.cpp [5/2549] /usr/bin/c++ -DQT_CORE_LIB -DQT_NO_CAST_FROM_ASCII -DQT_NO_CAST_TO_ASCII -DQT_NO_DEBUG -DQT_NO_EXCEPTIONS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT -DQT_USE_QSTRINGBUILDER -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -I/<>/obj-s390x-linux-gnu/src/qmltyperegistrar/qmltyperegistrar_autogen/include -I/<>/src/qmltyperegistrar -I/<>/obj-s390x-linux-gnu/src/qmltyperegistrar -I/<>/src/qmltyperegistrar/../qmlcompiler -isystem /usr/include/s390x-linux-gnu/qt6/QtCore -isystem /usr/include/s390x-linux-gnu/qt6 -isystem /usr/lib/s390x-linux-gnu/qt6/mkspecs/linux-g++ -isystem /usr/include/s390x-linux-gnu/qt6/QtCore/6.2.2 -isystem /usr/include/s390x-linux-gnu/qt6/QtCore/6.2.2/QtCore -g -O2 -ffile-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIE -fvisibility=hidden -fvisibilit
where to search for s390x workers for ci.debian.net?
Hi, Results of autopkgtests in Debian packages are used to influence package migration from unstable to testing. I recently started to expand the amount of architectures that ci.debian.net is covering. We currently have amd64, arm64, armhf, i386 and ppc64el. I have good hopes that we can expand the amount of ppc64el workers and that we get mips64el workers soon (which can also run mipsel). Besides armel, that leaves s390x as release architecture with missing autopkgtest coverage. Does any of you have ideas where we would be able to get access to s390x hosts to run autopkgtests on? Paul signature.asc Description: OpenPGP digital signature
Re: The (uncalled for) toolchain maintainers roll call for stretch
Hi, On 10-09-16 00:48, Matthias Klose wrote: > - fpc not available on powerpc anymore (may have changed recently) For whatever it is worth, this was finally fixed this week. It is missing on mips*, ppc64el and s390x though, while at least some form of MIPS is supported upstream. Paul signature.asc Description: OpenPGP digital signature