Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2023-03-05 Thread Salvatore Bonaccorso
Hi Paul,

On Sat, Mar 04, 2023 at 05:22:32PM +0100, Paul Gevers wrote:
> Hi,
> 
> On 04-03-2023 17:19, Salvatore Bonaccorso wrote:
> > So would agree, it make sense to update all remaining hosts and then
> > look into it again in case the problem arise again.
> 
> All but s390x are up-to-date and liburing testing works. I can't upgrade
> s390x because of bug #1031753 (which is worse for ci.d.n than this bug as
> far as I see).

Ok understand! The fix for that is included in the next upload.

Regards,
Salvatore



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2023-03-04 Thread Paul Gevers

Hi,

On 04-03-2023 17:19, Salvatore Bonaccorso wrote:

So would agree, it make sense to update all remaining hosts and then
look into it again in case the problem arise again.


All but s390x are up-to-date and liburing testing works. I can't upgrade 
s390x because of bug #1031753 (which is worse for ci.d.n than this bug 
as far as I see).


Paul


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2023-03-04 Thread Salvatore Bonaccorso
Hi,

On Thu, Mar 02, 2023 at 12:47:53PM +0100, Guillem Jover wrote:
> Hi!
> 
> On Thu, 2023-03-02 at 12:12:06 +0100, Paul Gevers wrote:
> > Control: fixed -1 5.10.162-1 6.1.8-1
> 
> > On Sun, 21 Aug 2022 21:35:58 +0200 Bastian Blank  wrote:
> > > On Sun, Aug 21, 2022 at 07:42:10PM +0200, Guillem Jover wrote:
> > > > It seems like there was a regression with the latest stable update
> > > > that affects the autopkgtest for liburing. Reassigning.
> > > 
> > > Please provide enough information to make isolating the problem
> > > possible.
> > > 
> > > https://ci.debian.net/packages/libu/liburing/ is completely silent as
> > > there are not results for any of the failed runs.
> 
> > I decided to try again to see if I could collect more information. The test
> > now passes on amd64, arm64, i386 and ppc64el, all running 5.10.162-1 and on
> > riscv64 running unstable. However, on armhf, armel (amd64 kernel) and s390x
> > (all running 5.10.158-2), it seems that the observation of brian is still
> > true, some test in test-unit test segfaults, the test exits and hangs.
> > @Guillem, do you see something more in the output below (armhf log) that may
> > be of interest? And maybe spot something to run in isolation?
> 
> > Reading the changelog of 5.10.162-1 I see io_uring mentioned a couple of
> > times. Therefor I assume this bug is fixed in that version. Is it worth
> > pursuing the real issue here?
> 
> Thanks for looking into this! As this appears fixed in latest Linux
> releases, then I'd honestly not bother further, and just try to get the
> remaining hosts upgraded to the fixed versions. If this was to reappear
> then it might make sense to look into it again.

FWIW, it is correct, the 5.10.162 release upstream was basically the
one to bring the io_uring implementation inline with 5.15.y series
(with the io_uring codebase from 5.15.85).

So would agree, it make sense to update all remaining hosts and then
look into it again in case the problem arise again.

Regards,
Salvatore



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2023-03-02 Thread Guillem Jover
Hi!

On Thu, 2023-03-02 at 12:12:06 +0100, Paul Gevers wrote:
> Control: fixed -1 5.10.162-1 6.1.8-1

> On Sun, 21 Aug 2022 21:35:58 +0200 Bastian Blank  wrote:
> > On Sun, Aug 21, 2022 at 07:42:10PM +0200, Guillem Jover wrote:
> > > It seems like there was a regression with the latest stable update
> > > that affects the autopkgtest for liburing. Reassigning.
> > 
> > Please provide enough information to make isolating the problem
> > possible.
> > 
> > https://ci.debian.net/packages/libu/liburing/ is completely silent as
> > there are not results for any of the failed runs.

> I decided to try again to see if I could collect more information. The test
> now passes on amd64, arm64, i386 and ppc64el, all running 5.10.162-1 and on
> riscv64 running unstable. However, on armhf, armel (amd64 kernel) and s390x
> (all running 5.10.158-2), it seems that the observation of brian is still
> true, some test in test-unit test segfaults, the test exits and hangs.
> @Guillem, do you see something more in the output below (armhf log) that may
> be of interest? And maybe spot something to run in isolation?

> Reading the changelog of 5.10.162-1 I see io_uring mentioned a couple of
> times. Therefor I assume this bug is fixed in that version. Is it worth
> pursuing the real issue here?

Thanks for looking into this! As this appears fixed in latest Linux
releases, then I'd honestly not bother further, and just try to get the
remaining hosts upgraded to the fixed versions. If this was to reappear
then it might make sense to look into it again.

Regards,
Guillem



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2023-03-02 Thread Paul Gevers

Control: fixed -1 5.10.162-1 6.1.8-1

Hi,

On Sun, 21 Aug 2022 21:35:58 +0200 Bastian Blank  wrote:

On Sun, Aug 21, 2022 at 07:42:10PM +0200, Guillem Jover wrote:
> It seems like there was a regression with the latest stable update
> that affects the autopkgtest for liburing. Reassigning.

Please provide enough information to make isolating the problem
possible.

https://ci.debian.net/packages/libu/liburing/ is completely silent as
there are not results for any of the failed runs.


I decided to try again to see if I could collect more information. The 
test now passes on amd64, arm64, i386 and ppc64el, all running 
5.10.162-1 and on riscv64 running unstable. However, on armhf, armel 
(amd64 kernel) and s390x (all running 5.10.158-2), it seems that the 
observation of brian is still true, some test in test-unit test 
segfaults, the test exits and hangs. @Guillem, do you see something more 
in the output below (armhf log) that may be of interest? And maybe spot 
something to run in isolation?


When I try to destroy the lxc, that fails and in ps output I see this:
root 3053528  0.0  0.0   5388  3072 ?Ss   03:34   0:00 [lxc 
monitor] /var/lib/lxc ci-061-8c60e21c
root 3061512  0.0  0.0  0 0 ?Ss   03:35   0:00  \_ 
[systemd]
debian   3110684  0.0  0.0   2140   192 ?DL   03:37   0:00 
\_ ./iopoll-leak.t


Note the "D" state.

Reading the changelog of 5.10.162-1 I see io_uring mentioned a couple of 
times. Therefor I assume this bug is fixed in that version. Is it worth 
pursuing the real issue here?


Paul

root@ci-061-705317d0:/tmp/autopkgtest-lxc.v8gx_5j5/downtmp# cat 
test-unit-stdout

+ [ -n  ]
+ CC=gcc
+ ./configure --cc=gcc
prefix/usr
includedir/usr/include
libdir/usr/lib
libdevdir /usr/lib
relativelibdir
mandir/usr/man
datadir   /usr/share
stringop_overflow yes
array_bounds  yes
__kernel_rwf_tyes
__kernel_timespec yes
open_how  yes
statx yes
glibc_statx   yes
C++   yes
has_ucontext  yes
NVMe uring command supportyes
liburing_nolibc   no
CCgcc
CXX   g++
+ make runtests
make[1]: Entering directory 
'/tmp/autopkgtest-lxc.v8gx_5j5/downtmp/build.ksh/src/src'

 CC setup.ol
 CC queue.ol
 CC register.ol
 CC syscall.ol
 AR liburing.a
ar: creating liburing.a
 RANLIB liburing.a
 CC setup.os
 CC queue.os
 CC register.os
 CC syscall.os
 CC liburing.so.2.3
make[1]: Leaving directory 
'/tmp/autopkgtest-lxc.v8gx_5j5/downtmp/build.ksh/src/src'
make[1]: Entering directory 
'/tmp/autopkgtest-lxc.v8gx_5j5/downtmp/build.ksh/src/test'

 CC helpers.o
 CC 232c93d07b74.t
 CC 35fa71a030ca.t
 CC 500f9fbadef8.t
 CC 7ad0e4b2f83c.t
 CC 8a9973408177.t
 CC 917257daa0fe.t
 CC a0908ae19763.t
 CC a4c0b3decb33.t
 CC accept.t
 CC accept-link.t
 CC accept-reuse.t
 CC accept-test.t
 CC across-fork.t
 CC b19062a56726.t
 CC b5837bd5311d.t
 CC buf-ring.t
 CC ce593a6c480a.t
 CC close-opath.t
 CC connect.t
 CC cq-full.t
 CC cq-overflow.t
 CC cq-peek-batch.t
 CC cq-ready.t
 CC cq-size.t
 CC d4ae271dfaae.t
 CC d77a67ed5f27.t
 CC defer.t
 CC defer-taskrun.t
 CC double-poll-crash.t
 CC drop-submit.t
 CC eeed8b54e0df.t
 CC empty-eownerdead.t
 CC eventfd.t
 CC eventfd-disable.t
 CC eventfd-reg.t
 CC eventfd-ring.t
 CC exec-target.t
 CC exit-no-cleanup.t
 CC fadvise.t
 CC fallocate.t
 CC fc2a85cb02ef.t
 CC fd-pass.t
 CC file-register.t
 CC files-exit-hang-poll.t
 CC files-exit-hang-timeout.t
 CC file-update.t
 CC file-verify.t
 CC fixed-buf-iter.t
 CC fixed-link.t
 CC fixed-reuse.t
 CC fpos.t
 CC fsync.t
 CC hardlink.t
 CC io-cancel.t
 CC iopoll.t
 CC iopoll-leak.t
 CC io_uring_enter.t
 CC io_uring_passthrough.t
 CC io_uring_register.t
 CC io_uring_setup.t
 CC lfs-openat.t
 CC lfs-openat-write.t
 CC link.t
 CC link_drain.t
 CC link-timeout.t
 CC madvise.t
 CC mkdir.t
 CC msg-ring.t
 CC multicqes_drain.t
 CC nolibc.t
 CC nop-all-sizes.t
 CC nop.t
 CC openat2.t
 CC open-close.t
 CC open-direct-link.t
 CC open-direct-pick.t
 CC personality.t
 CC pipe-eof.t
 CC pipe-reuse.t
 CC poll.t
 CC poll-cancel.t
 CC poll-cancel-all.t
 CC poll-cancel-ton.t
 CC poll-link.t
 CC poll-many.t
 CC poll-mshot-update.t
 CC poll-mshot-overflow.t
 CC poll-ring.t
 CC poll-v-poll.t
 CC pollfree.t
 CC probe.t
 CC read-before-exit.t
 CC read-write.t
 CC recv-msgall.t
 CC 

Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-08-21 Thread Bastian Blank
Control: tags -1 moreinfo
Control: severity -1 normal

On Sun, Aug 21, 2022 at 07:42:10PM +0200, Guillem Jover wrote:
> It seems like there was a regression with the latest stable update
> that affects the autopkgtest for liburing. Reassigning.

Please provide enough information to make isolating the problem
possible.

https://ci.debian.net/packages/libu/liburing/ is completely silent as
there are not results for any of the failed runs.

Bastian

-- 
There is an order of things in this universe.
-- Apollo, "Who Mourns for Adonais?" stardate 3468.1



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-08-21 Thread Guillem Jover
Control: reassign -1 linux
Control: affects -1 liburing debci

Hi!

It seems like there was a regression with the latest stable update
that affects the autopkgtest for liburing. Reassigning.

On Mon, 2022-07-18 at 21:07:28 +0200, Paul Gevers wrote:
> Source: liburing
> Version: 2.1-2
> Severity: important
> X-Debbugs-Cc: br...@ubuntu.com

> Some days ago (approximately 7) the autopkgtest of liburing started to
> behave badly on the Debian and Ubuntu infrastructure. It's not totally
> clear to me what happens, but I have lxc containers left behind after
> the test that I can't clean up.
> 
> A similar thing seems to happen on the Ubuntu side because they have
> blocked the test from running recently and I'll do the same on our
> side.
> 
> https://git.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs/commit/?id=fedf747ea808837217d373773d105f242819702d
> 
> Because your package didn't change in that time, I suspect one of your
> dependencies caused liburing to behave differently. It would be great
> if we figured out what that is.

Thanks,
Guillem



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-08-20 Thread Paul Gevers

Hi,

On 20-08-2022 13:44, Guillem Jover wrote:

The liburing test suite works on the buildds, I just uploaded a new
release and it built fine locally with Linux 5.19.0-trunk-amd64, and
on the buildds. I think this needs to be reassigned, I'd assume to the
kernel (but I'm not entirely sure).


I'd say, go ahead.


Also could a newer kernel be tried on the infra?


If it's in a reasonable archive and the version sorting is such that 
it's below the next stable or security update version, yes. But I don't 
want to deviate from the stable (security) archive longer than needed.


Paul


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-08-20 Thread Guillem Jover
Hi!


On Tue, 2022-07-19 at 08:28:09 +0200, Paul Gevers wrote:
> On 18-07-2022 23:44, Guillem Jover wrote:
> > While only having skimmed over the logs, the first suspect for me would
> > be the Linux kernel, where liburing might be triggering some kernel bug.
> > Were the systems upgraded around that time with a newer kernel version
> > (is that a 5.18.x from bpo)?
> 
> In Debian we did have a point release on 2022-07-09, I upgraded that evening
> (UTC).

So this seems to confirm that the problem lays elsewhere. :)

> > The only other options if that would not
> > have been the case that seem plausible would be glibc (which was uploaded
> > on the 10th, and could be a suspect), or lxc itself or similar, but that
> > has not been uploaded for a while in sid.
> 
> lxc comes from stable. Everything in it is either unstable or testing.
> 
> I'm not totally sure about the Ubuntu setup. Maybe Brian can comment on
> that.

The liburing test suite works on the buildds, I just uploaded a new
release and it built fine locally with Linux 5.19.0-trunk-amd64, and
on the buildds. I think this needs to be reassigned, I'd assume to the
kernel (but I'm not entirely sure).

Also could a newer kernel be tried on the infra?

Thanks,
Guillem



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-07-19 Thread Paul Gevers

Hi,,

On 18-07-2022 23:44, Guillem Jover wrote:

While only having skimmed over the logs, the first suspect for me would
be the Linux kernel, where liburing might be triggering some kernel bug.
Were the systems upgraded around that time with a newer kernel version
(is that a 5.18.x from bpo)?


In Debian we did have a point release on 2022-07-09, I upgraded that 
evening (UTC).



The only other options if that would not
have been the case that seem plausible would be glibc (which was uploaded
on the 10th, and could be a suspect), or lxc itself or similar, but that
has not been uploaded for a while in sid.


lxc comes from stable. Everything in it is either unstable or testing.

I'm not totally sure about the Ubuntu setup. Maybe Brian can comment on 
that.


Paul


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-07-18 Thread Guillem Jover
Hi!

On Mon, 2022-07-18 at 21:07:28 +0200, Paul Gevers wrote:
> Source: liburing
> Version: 2.1-2
> Severity: important
> X-Debbugs-Cc: br...@ubuntu.com

> Some days ago (approximately 7) the autopkgtest of liburing started to
> behave badly on the Debian and Ubuntu infrastructure. It's not totally
> clear to me what happens, but I have lxc containers left behind after
> the test that I can't clean up.
> 
> A similar thing seems to happen on the Ubuntu side because they have
> blocked the test from running recently and I'll do the same on our
> side.
> 
> https://git.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs/commit/?id=fedf747ea808837217d373773d105f242819702d
> 
> Because your package didn't change in that time, I suspect one of your
> dependencies caused liburing to behave differently. It would be great
> if we figured out what that is.

While only having skimmed over the logs, the first suspect for me would
be the Linux kernel, where liburing might be triggering some kernel bug.
Were the systems upgraded around that time with a newer kernel version
(is that a 5.18.x from bpo)? The only other options if that would not
have been the case that seem plausible would be glibc (which was uploaded
on the 10th, and could be a suspect), or lxc itself or similar, but that
has not been uploaded for a while in sid.

Thanks,
Guillem



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-07-18 Thread Brian Murray
On Mon, Jul 18, 2022 at 09:07:28PM +0200, Paul Gevers wrote:
> Source: liburing
> Version: 2.1-2
> Severity: important
> X-Debbugs-Cc: br...@ubuntu.com
> 
> Dear Guillem,
> 
> Some days ago (approximately 7) the autopkgtest of liburing started to
> behave badly on the Debian and Ubuntu infrastructure. It's not totally
> clear to me what happens, but I have lxc containers left behind after
> the test that I can't clean up.
> 
> A similar thing seems to happen on the Ubuntu side because they have
> blocked the test from running recently and I'll do the same on our
> side.
> 
> https://git.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs/commit/?id=fedf747ea808837217d373773d105f242819702d
> 
> Because your package didn't change in that time, I suspect one of your
> dependencies caused liburing to behave differently. It would be great
> if we figured out what that is.

I managed to capture some information from an instance in an Ubuntu bug
report - http://launchpad.net/bugs/1981636.

--
Brian Murray @ubuntu.com



Bug#1015272: liburing autopkgtest started to hang containers in Debian and Ubuntu since ~2022-07-11

2022-07-18 Thread Paul Gevers
Source: liburing
Version: 2.1-2
Severity: important
X-Debbugs-Cc: br...@ubuntu.com

Dear Guillem,

Some days ago (approximately 7) the autopkgtest of liburing started to
behave badly on the Debian and Ubuntu infrastructure. It's not totally
clear to me what happens, but I have lxc containers left behind after
the test that I can't clean up.

A similar thing seems to happen on the Ubuntu side because they have
blocked the test from running recently and I'll do the same on our
side.

https://git.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs/commit/?id=fedf747ea808837217d373773d105f242819702d

Because your package didn't change in that time, I suspect one of your
dependencies caused liburing to behave differently. It would be great
if we figured out what that is.

Paul