Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-04-02 Thread Paul Gevers
Hi

On 02-04-2021 18:54, intrigeri wrote:
> Paul Gevers (2021-04-02):
>> On 02-04-2021 14:00, intrigeri wrote:
>>> I would like to see the same 1-line change in Bullseye, in the hope
>>> that it's enough to allow you folks to remove src:apparmor from
>>> the blocklist.
>>
>> Shall we test first if it helps?
> 
> Sure :)
> 
> I understand I can't do this myself.

If I counted right, I ran 23 tests with the package from experimental
(22 on arm64 and 1 on amd64). There was one tmpfail for a different
reason. So indeed, this trick seems to work.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-04-02 Thread Paul Gevers
Hi,

On 02-04-2021 18:54, intrigeri wrote:
> Paul Gevers (2021-04-02):
>> On 02-04-2021 14:00, intrigeri wrote:
>>> I would like to see the same 1-line change in Bullseye, in the hope
>>> that it's enough to allow you folks to remove src:apparmor from
>>> the blocklist.
>>
>> Shall we test first if it helps?
> 
> Sure :)
> 
> I understand I can't do this myself.

I just confirmed there's a bug in our infrastructure and you can
actually do this (I already did; scheduled 10 runs after the first two
succeeded). Apparently the retry api *doesn't* check for the rejectlist.
I'll file a bug against debci shortly.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-04-02 Thread intrigeri
Paul Gevers (2021-04-02):
> On 02-04-2021 14:00, intrigeri wrote:
>> I would like to see the same 1-line change in Bullseye, in the hope
>> that it's enough to allow you folks to remove src:apparmor from
>> the blocklist.
>
> Shall we test first if it helps?

Sure :)

I understand I can't do this myself.



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-04-02 Thread Paul Gevers
Hi intrigeri,

On 02-04-2021 14:00, intrigeri wrote:
> I would like to see the same 1-line change in Bullseye, in the hope
> that it's enough to allow you folks to remove src:apparmor from
> the blocklist.

Shall we test first if it helps?

> Would you like to pre-approve this here, or do you prefer that
> I request pre-approval via the regular release team process?

If it works, and if all you do is change that one line and a changelog
entry, then I'll unblock it, yes. But first, let's see if it does what
you hope, no?

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-04-02 Thread intrigeri
Hi Paul,

Paul Gevers (2021-02-18):
> Hi intrigeri,
>
> On 18-02-2021 10:34, intrigeri wrote:
>>>   # Dummy test so that changes to linux-image-amd64 trigger our other 
>>> autopkgtests
>>>   # on ci.debian.net
>
> By the way, we have the "hint-testsuite-triggers" for this.
>
>> Actually, apparmor-profiles-extra has the exact same test, and AFAICT
>> it seems to run pretty reliably there, so I now have doubts about the
>> hypothesis I quoted above.
>> 
>> Still, perhaps it's worth trying to add isolation-machine to that
>> test, remove src:apparmor from the blocklist, and see what happens?
>
> If you add that restriction (instead of the current ones), the test
> isn't run anywhere (and can't cause any issues).

Thanks!

In 3.0.1-6 (experimental) I've replaced the current restrictions with
hint-testsuite-triggers, i.e.:

   Test-Command: /bin/true
   Depends: linux-image-amd64 [amd64] | linux-image-generic [ amd64 ]
  -Restrictions: superficial, skip-not-installable
  +Restrictions: hint-testsuite-triggers

I've verified that the resulting .dsc has the same
Testsuite-Triggers field.

I would like to see the same 1-line change in Bullseye, in the hope
that it's enough to allow you folks to remove src:apparmor from
the blocklist.

Would you like to pre-approve this here, or do you prefer that
I request pre-approval via the regular release team process?

Cheers,
-- 
intrigeri



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-18 Thread Paul Gevers
Ooo,

On 18-02-2021 10:34, intrigeri wrote:
>>   # Dummy test so that changes to linux-image-amd64 trigger our other 
>> autopkgtests
>>   # on ci.debian.net
>>   Test-Command: /bin/true
>>   Depends: linux-image-amd64 [amd64] | linux-image-generic [ amd64 ]
>>   Restrictions: superficial, skip-not-installable
>>
>> … and this is the one I should mark it as isolation-machine,
>> so we can resume running the other 2 tests on ci.d.n,
>> which I would very much like.
>>
>> Makes sense?
> 
> Actually, apparmor-profiles-extra has the exact same test, and AFAICT
> it seems to run pretty reliably there, so I now have doubts about the
> hypothesis I quoted above.
> 
> Still, perhaps it's worth trying to add isolation-machine to that
> test, remove src:apparmor from the blocklist, and see what happens?

I think this test only makes sense by the way in isolation-machine case,
as our workers are nearly all running stable, and you get the host
kernel in your lxc, don't you?

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-18 Thread Paul Gevers
Hi intrigeri,

On 18-02-2021 10:34, intrigeri wrote:
>>   # Dummy test so that changes to linux-image-amd64 trigger our other 
>> autopkgtests
>>   # on ci.debian.net

By the way, we have the "hint-testsuite-triggers" for this.

> Actually, apparmor-profiles-extra has the exact same test, and AFAICT
> it seems to run pretty reliably there, so I now have doubts about the
> hypothesis I quoted above.
> 
> Still, perhaps it's worth trying to add isolation-machine to that
> test, remove src:apparmor from the blocklist, and see what happens?

If you add that restriction (instead of the current ones), the test
isn't run anywhere (and can't cause any issues).

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-18 Thread intrigeri
Hi,

intrigeri (2021-02-06):
> I understand the LXC container is stopped and restarted between each
> test, so what Paul wrote above suggests the container is successfully
> stopped between the 1st and 2nd test, same between the 2nd and 3rd
> test, but then it occasionally fails to stop after the last
> (3rd) test. Correct?
>
> If this analysis is correct, then the culprit has to be in the 3rd
> test i.e.:
>
>   # Dummy test so that changes to linux-image-amd64 trigger our other 
> autopkgtests
>   # on ci.debian.net
>   Test-Command: /bin/true
>   Depends: linux-image-amd64 [amd64] | linux-image-generic [ amd64 ]
>   Restrictions: superficial, skip-not-installable
>
> … and this is the one I should mark it as isolation-machine,
> so we can resume running the other 2 tests on ci.d.n,
> which I would very much like.
>
> Makes sense?

Actually, apparmor-profiles-extra has the exact same test, and AFAICT
it seems to run pretty reliably there, so I now have doubts about the
hypothesis I quoted above.

Still, perhaps it's worth trying to add isolation-machine to that
test, remove src:apparmor from the blocklist, and see what happens?

Cheers!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-06 Thread intrigeri
Hi,

I've just run the src:apparmor autopkgtests on ci-worker05 a bunch
of times, using the autopkgtest command:

- from an unpacked source tree:

  - experimental (3.0.1-4): 5 times
  - unstable (2.13.6-8): 3 times

- using the package that's in the archive:

  - experimental (3.0.1-4): 2 times
  - unstable (2.13.6-8): 2 times
  - Buster (2.13.2-10): 2 times

Every time, the test succeeded, then the autopkgtest command exited
immediately, and the LXC container was promptly stopped and destroyed.

IOW, I was not able to reproduce the bug.

So either I've been incredibly lucky, or the bug has already been
fixed somehow, or the bug is only triggered when the autopkgtest is
started by the debci worker service.

Any idea on how to proceed from here?
I suppose someone would have to be logged into the relevant worker
when the bug happens, in order to investigate further.

OTOH it's also tempting to wait until you upgrade the ci.d.n infra to
Bullseye and come back to it at that point, especially if there's
a simple short-term mitigation:

Paul Gevers wrote:
> Something in the test is very often preventing autopkgtest (the
> binary) from stopping and cleaning up the lxc container within the
> 600 seconds it gets to do that, which leads to a tmpfail for the
> apparmor autopkgtest and a still running lxc container on the
> worker. […] Your autopkgtest itself normally passes before causing
> the issue.

I understand the LXC container is stopped and restarted between each
test, so what Paul wrote above suggests the container is successfully
stopped between the 1st and 2nd test, same between the 2nd and 3rd
test, but then it occasionally fails to stop after the last
(3rd) test. Correct?

If this analysis is correct, then the culprit has to be in the 3rd
test i.e.:

  # Dummy test so that changes to linux-image-amd64 trigger our other 
autopkgtests
  # on ci.debian.net
  Test-Command: /bin/true
  Depends: linux-image-amd64 [amd64] | linux-image-generic [ amd64 ]
  Restrictions: superficial, skip-not-installable

… and this is the one I should mark it as isolation-machine,
so we can resume running the other 2 tests on ci.d.n,
which I would very much like.

Makes sense?

Cheers!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-06 Thread intrigeri
Hi Paul,

Paul Gevers (2021-02-05):
> On 05-02-2021 18:02, intrigeri wrote:
>> First, I'm wondering if this bug might be related to the problem you
>> recently fixed in debci's LXC containers AppArmor configuration.
>
> That was only on one particular ppc64el host, so that shouldn't impact
> the other architectures.

Ah, right :)

>> What about giving it another try?
>
> After the above, does that really make sense?

Not at all.

> I'll give further instructions in private.

Thanks!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-05 Thread Paul Gevers
Hi intrigeri,

On 05-02-2021 18:02, intrigeri wrote:
> First, I'm wondering if this bug might be related to the problem you
> recently fixed in debci's LXC containers AppArmor configuration.

That was only on one particular ppc64el host, so that shouldn't impact
the other architectures.

> What about giving it another try?

After the above, does that really make sense?

> If that's not enough, then I'd like to come back to what we discussed
> a few months ago:

[...]

>> Sure, will do right away in a private email.
> 
> It seems I failed to follow up on this, sorry!
> 
> I suppose it'll be easier to coordinate this via IRC (although
> I don't have a permanently-connected client).

No, I forgot to follow up. I'll give further instructions in private.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2021-02-05 Thread intrigeri
Hi,

First, I'm wondering if this bug might be related to the problem you
recently fixed in debci's LXC containers AppArmor configuration.
What about giving it another try?

If that's not enough, then I'd like to come back to what we discussed
a few months ago:

intrigeri (2020-10-27):
> Paul Gevers (2020-10-25):
>> On 25-10-2020 19:44, intrigeri wrote:
> Is there a better way for me to investigate?
>>> 
 We have given DD's temporarily access to one of our workers before. If
 you're interested we could do that again for this case. That way you
 could even skip the upload to experimental, assuming it reproduces if
 run from a local tree. And you can check what's going on in the test bed.
>>> 
>>> This looks great. I'd like to do this once the updated baseline is
>>> established, if it still fails. I could book time for this on Nov
>>> 28-29.
>>
>> Can you can already point me at your public key? Easiest is with a
>> signed e-mail, but if it's otherwise easily traceable it's from you, I
>> can take it from elsewhere.
>
> Sure, will do right away in a private email.

It seems I failed to follow up on this, sorry!

I suppose it'll be easier to coordinate this via IRC (although
I don't have a permanently-connected client).

Cheers!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2020-10-27 Thread intrigeri
Hi,

Paul Gevers (2020-10-25):
> On 25-10-2020 19:44, intrigeri wrote:
>> This being said, this was a while ago, and I wonder if the problem got
>> somehow fixed in one of those packages in the meantime. Could you
>> please give it another try with 2.13.5-1 (sid) or 3.0.0-1
>> (experimental), and ideally both? This would establish an updated
>> baseline for further investigation.
>
> Scheduled on amd64. Note however that I reported that the test didn't
> always fail, so if it passes, it's not saying for sure that everything
> is OK.

> https://ci.debian.net/user/elbrus/jobs?package=apparmor

Thank you.

https://ci.debian.net/data/autopkgtest/unstable/amd64/a/apparmor/7743678/log.gz
(sid) did not expose the problem. As you said, it does not prove anything.

https://ci.debian.net/data/autopkgtest/unstable/amd64/a/apparmor/7743977/log.gz
(experimental) failed because I asked you to trigger this too early:
my 3.0.0-1 upload had not reached the archive. I've tried to
re-trigger it myself but it seems that's not possible since the
package is in the blocklist. Could you please re-trigger it?
(package = apparmor, suite = unstable, pin = "src:apparmor, experimental")

 Is there a better way for me to investigate?
>> 
>>> We have given DD's temporarily access to one of our workers before. If
>>> you're interested we could do that again for this case. That way you
>>> could even skip the upload to experimental, assuming it reproduces if
>>> run from a local tree. And you can check what's going on in the test bed.
>> 
>> This looks great. I'd like to do this once the updated baseline is
>> established, if it still fails. I could book time for this on Nov
>> 28-29.
>
> Can you can already point me at your public key? Easiest is with a
> signed e-mail, but if it's otherwise easily traceable it's from you, I
> can take it from elsewhere.

Sure, will do right away in a private email.

Cheers!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2020-10-25 Thread Paul Gevers
Hi intrigeri,

On 25-10-2020 19:44, intrigeri wrote:
> This being said, this was a while ago, and I wonder if the problem got
> somehow fixed in one of those packages in the meantime. Could you
> please give it another try with 2.13.5-1 (sid) or 3.0.0-1
> (experimental), and ideally both? This would establish an updated
> baseline for further investigation.

Scheduled on amd64. Note however that I reported that the test didn't
always fail, so if it passes, it's not saying for sure that everything
is OK.

>>> Is there a better way for me to investigate?
> 
>> We have given DD's temporarily access to one of our workers before. If
>> you're interested we could do that again for this case. That way you
>> could even skip the upload to experimental, assuming it reproduces if
>> run from a local tree. And you can check what's going on in the test bed.
> 
> This looks great. I'd like to do this once the updated baseline is
> established, if it still fails. I could book time for this on Nov
> 28-29.

Can you can already point me at your public key? Easiest is with a
signed e-mail, but if it's otherwise easily traceable it's from you, I
can take it from elsewhere.

Paul



signature.asc
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2020-10-25 Thread intrigeri
Hi,

Paul Gevers (2020-05-25):
> The most obvious alternative is that your run it locally, but I guess
> you tried and couldn't reproduce?

I usually use the libvirt backend but I tried today with the lxc
backend locally, and could not reproduce. Note I don't see this
problem on Salsa CI either.

I took a closer look and the problem happens after successfully
running the compile-policy test. I can't imagine how what this test
does can interfere with shutting down the container, *but* that test
installs a bunch of packages:

  apparmor, apparmor-profiles-extra, bind9, cups-browsed, cups-daemon,
  evince, haveged, kopano-dagent, kopano-server, libreoffice-common,
  libvirt-daemon-system, man-db, ntp, onioncircuits, tcpdump, tor

I suspect one of those failed to stop within the 600s timeout in the
ci.d.n environment.

This being said, this was a while ago, and I wonder if the problem got
somehow fixed in one of those packages in the meantime. Could you
please give it another try with 2.13.5-1 (sid) or 3.0.0-1
(experimental), and ideally both? This would establish an updated
baseline for further investigation.

>> Is there a better way for me to investigate?

> We have given DD's temporarily access to one of our workers before. If
> you're interested we could do that again for this case. That way you
> could even skip the upload to experimental, assuming it reproduces if
> run from a local tree. And you can check what's going on in the test bed.

This looks great. I'd like to do this once the updated baseline is
established, if it still fails. I could book time for this on Nov
28-29.

Cheers!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2020-05-25 Thread Paul Gevers
Hi intrigeri,

On 25-05-2020 11:18, intrigeri wrote:
> Thanks for letting me know — sorry for the delay in answering.

No problem.

> I don't really have a clue at this stage.

Ack.

> My approach would be to first figure out which one, among the 2 tests
> (compile-policy and test-installed), is causing the breakage.
> And if the problem lies in compile-policy, I'd like to check
> if the problem comes from a specific Depends of that test.
> 
> Ideally I would do that without doing uploads to sid merely for
> bisection purposes. I'm willing to do test uploads to experimental.

Right.

> In the debci self-service interface, it seems I could force debci to
> install all packages built from src:apparmor from experimental,
> which looks like what I need.
> 
> Now, to run those tests, I would need apparmor to be temporarily
> removed from the blacklist, and some coordination so that a ci.d.n
> maintainer can clean up whatever mess the tests create while the
> package is temporarily un-blacklisted.

With the coordination already required, you can just upload to
experimental and I/we can schedule the test if you ping us.

> I would be happy to book some time to work on this in
> a coordinated manner.

Great.

> Does this approach make sense to you?

The most obvious alternative is that your run it locally, but I guess
you tried and couldn't reproduce?

> Is there a better way for me to investigate?

We have given DD's temporarily access to one of our workers before. If
you're interested we could do that again for this case. That way you
could even skip the upload to experimental, assuming it reproduces if
run from a local tree. And you can check what's going on in the test bed.

Paul



signature.asc
Description: OpenPGP digital signature


Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2020-05-25 Thread intrigeri
Hi,

Paul Gevers (2020-03-22):
> I'm not sure what's going on, but I wanted to at least inform you that
> the apparmor autopkgtest is not working smoothly on the ci.debian.net
> infrastructure. Something in the test is very often preventing
> autopkgtest (the binary) from stopping and cleaning up the lxc container
> within the 600 seconds it gets to do that, which leads to a tmpfail for
> the apparmor autopkgtest and a still running lxc container on the
> worker. Obviously there's a bug somewhere in either lxc and/or
> autopkgtest, as you shouldn't be able to break the infrastructure in
> this way, but maybe you have a clue what could be the cause of this and
> help us to fix the underlying issue. Your autopkgtest itself normally
> passes before causing the issue.

Thanks for letting me know — sorry for the delay in answering.

I don't really have a clue at this stage.

My approach would be to first figure out which one, among the 2 tests
(compile-policy and test-installed), is causing the breakage.
And if the problem lies in compile-policy, I'd like to check
if the problem comes from a specific Depends of that test.

Ideally I would do that without doing uploads to sid merely for
bisection purposes. I'm willing to do test uploads to experimental.

In the debci self-service interface, it seems I could force debci to
install all packages built from src:apparmor from experimental,
which looks like what I need.

Now, to run those tests, I would need apparmor to be temporarily
removed from the blacklist, and some coordination so that a ci.d.n
maintainer can clean up whatever mess the tests create while the
package is temporarily un-blacklisted.

I would be happy to book some time to work on this in
a coordinated manner.

Does this approach make sense to you?
Is there a better way for me to investigate?

> One thing that may be required in your test if the test itself doesn't
> get updated is to mark it as isolation-machine

I agree this would be a better outcome than fully disabling all
testing of this package on debci (which is, understandably, the
current situation).

> although I'd like to understand the issue a bit better to know
> for sure.

Same!



Bug#954655: apparmor autopkgtest doesn't work nice on ci.d.n infrastructure

2020-03-22 Thread Paul Gevers
Source: apparmor
Version: 2.13.3-7
X-Debbugs-CC: debian...@lists.debian.org
Control: affects -1 autopkgtest
Control: affects -1 debci

Dear maintainer(s),

I'm not sure what's going on, but I wanted to at least inform you that
the apparmor autopkgtest is not working smoothly on the ci.debian.net
infrastructure. Something in the test is very often preventing
autopkgtest (the binary) from stopping and cleaning up the lxc container
within the 600 seconds it gets to do that, which leads to a tmpfail for
the apparmor autopkgtest and a still running lxc container on the
worker. Obviously there's a bug somewhere in either lxc and/or
autopkgtest, as you shouldn't be able to break the infrastructure in
this way, but maybe you have a clue what could be the cause of this and
help us to fix the underlying issue. Your autopkgtest itself normally
passes before causing the issue.

One thing that may be required in your test if the test itself doesn't
get updated is to mark it as isolation-machine although I'd like to
understand the issue a bit better to know for sure. We can see this
issue happening at least as far back as 2019-12-05 09:19:03 UTC on amd64
unstable, I haven't checked all the logs.

Paul



signature.asc
Description: OpenPGP digital signature