Bug#1057562: closed by Debian FTP Masters (reply to Jeremy Bícha ) (Bug#1057562: fixed in gcr4 4.2.0-2)

2024-04-23 Thread Jeremy Bícha
On Tue, Apr 23, 2024 at 11:00 AM Colin Watson  wrote:
> I've been attempting to debug this on an AWS instance provided by
> Santiago.  So far I'm afraid I can only report some partial progress,
> but I might as well write down what I've got so far.

Colin, thanks for looking into this issue. It also affects source gcr
(which is just the older version of gcr4).

If you have time, feel free to forward this issue to
https://gitlab.gnome.org/GNOME/gcr/-/issues/

Jeremy Bícha



Bug#1057562: closed by Debian FTP Masters (reply to Jeremy Bícha ) (Bug#1057562: fixed in gcr4 4.2.0-2)

2024-04-23 Thread Colin Watson
On Tue, Mar 12, 2024 at 03:48:24PM -0400, Jeremy Bícha wrote:
> Debian Policy does not say that it is a severity: serious bug because
> you are unable to compile gcr4 on your particular AWS instance. I
> understand that this bug appears serious to you. I also agree that
> there is a real bug in gcr. Perhaps the bug is a race condition.
> Fixing the issue that causes gcr build tests to fail 100% in your test
> case may also fix the flakiness issue seen on the official buildds.

I've been attempting to debug this on an AWS instance provided by
Santiago.  So far I'm afraid I can only report some partial progress,
but I might as well write down what I've got so far.

Whatever the bug is, it is highly sensitive to small perturbations.  For
instance, I found that commenting out non-failing g_test_add calls from
gck/test-gck-object.c:main (even those that run _after_ the tests that
typically fail) was enough to make it fail significantly less often.  I
suspect that this is just the effect of tweaking the state of hash
tables or a random number generator or something.

More unfortunately, attaching almost any kind of debugging tool seems to
perturb timing such that the problem is no longer reproducible; in
particular I was unable to reproduce failures under gdb.  The best I
could do was to generate a core dump, as follows:

  $ gdb gck/test-gck-object core
  GNU gdb (Debian 13.2-1+b1) 13.2
  Copyright (C) 2023 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  .
  Find the GDB manual and other documentation resources online at:
  .
  
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from gck/test-gck-object...
  [New LWP 31755]
  [New LWP 31753]
  [New LWP 31754]
  [New LWP 31751]
  [New LWP 31756]
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  Core was generated by 
`/home/cjwatson/gcr4-4.2.0/obj-x86_64-linux-gnu/gck/test-gck-object'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x7f4fef3a5633 in find_attribute (attr_type=3, 
n_attrs=12008468691120727718, attrs=0x55bef952d90a) at 
../gck/gck-attributes.c:336
  336 if (attrs[i].type == attr_type)
  [Current thread is 1 (Thread 0x7f4fed97a6c0 (LWP 31755))]
  (gdb) thread apply all bt
  
  Thread 5 (Thread 0x7f4fed1796c0 (LWP 31756)):
  #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  #1  0x7f4fef2ffc90 in g_cond_wait_until () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x7f4fef26e143 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #3  0x7f4fef2d24ba in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #4  0x7f4fef2d1ab1 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #5  0x7f4fef08f45c in start_thread (arg=) at 
./nptl/pthread_create.c:444
  #6  0x7f4fef10fbbc in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  
  Thread 4 (Thread 0x7f4fee9c4a00 (LWP 31751)):
  #0  0x7f4fef102abf in __GI___poll (fds=0x55bba2e90cb0, nfds=1, 
timeout=500) at ../sysdeps/unix/sysv/linux/poll.c:29
  #1  0x7f4fef2a4277 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x7f4fef2a4c1f in g_main_loop_run () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #3  0x55bba2564664 in loop_wait_until (timeout=) at 
../egg/egg-testing.c:310
  #4  0x55bba2562a4d in test_find_objects (test=0x55bba2e8f5c0, 
unused=) at ../gck/test-gck-object.c:403
  #5  0x7f4fef2cf71e in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #6  0x7f4fef2cf513 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #7  0x7f4fef2cf513 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #8  0x7f4fef2cfc32 in g_test_run_suite () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #9  0x7f4fef2cfcb8 in g_test_run () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #10 0x55bba2564b96 in egg_tests_run_with_loop () at 
../egg/egg-testing.c:326
  #11 0x55bba256268e in main (argc=, argv=) 
at ../gck/test-gck-object.c:426
  
  Thread 3 (Thread 0x7f4fee17b6c0 (LWP 31754)):
  #0  0x7f4fef102abf in __GI___poll (fds=0x55bba2e904b0, nfds=1, 
timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
  #1  0x7f4fef2a4277 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x7f4fef2a4930 in g_main_context_iteration () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #3  0x7f4fef2a4981 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #4  0x7f4fef2d1ab1 in  () at 

Bug#1057562: closed by Debian FTP Masters (reply to Jeremy Bícha ) (Bug#1057562: fixed in gcr4 4.2.0-2)

2024-03-12 Thread Jeremy Bícha
On Tue, Mar 12, 2024 at 3:32 PM Santiago Vila  wrote:
> This is a violation of a *must* directive in Policy, because
> Debian policy says this:
>
> If build-time dependencies are specified, it must be possible to build the 
> package and produce working binaries on a system with only essential and 
> build-essential packages installed and also those required to satisfy the 
> build-time relationships (including any implied relationships).

I disagree with your interpretation of Debian Policy. It is clearly
possible to build gcr and gcr4 on a system with only essential and
build-essential installed. See
https://buildd.debian.org/status/package.php?p=gcr4

Debian Policy does not say that it is a severity: serious bug because
you are unable to compile gcr4 on your particular AWS instance. I
understand that this bug appears serious to you. I also agree that
there is a real bug in gcr. Perhaps the bug is a race condition.
Fixing the issue that causes gcr build tests to fail 100% in your test
case may also fix the flakiness issue seen on the official buildds.

One problem with your insistence on declaring this bug serious was
that it put epiphany-browser on the auto-removal list. It was not that
critical. (That issue is obsolete since key packages are now using
gcr4.)

Unfortunately, the Debian GNOME team is too small for the amount of
work to be done and the number of open bugs. This bug is not so severe
that it requires my immediate attention compared to everything else
nor is it easy enough that I can fix it in a few minutes. Instead of
complaining about the severity the maintainer has assigned to the bug,
I guess you could try fixing it?

Thank you,
Jeremy Bícha



Bug#1057562: closed by Debian FTP Masters (reply to Jeremy Bícha ) (Bug#1057562: fixed in gcr4 4.2.0-2)

2024-03-12 Thread Santiago Vila

El 3/3/24 a las 17:38, Jeremy Bícha escribió:

Control: severity -1 important
Control: affects -1 src:gcr

On Fri, Mar 1, 2024 at 6:36 AM Santiago Vila  wrote:

Ignore build test failures on s390x (Closes: #1057562)


This is wrong for several reasons.

- The bug report did not say anything about s390x.


s390x is the only architecture where the flakiness of the gcr:gck /
object build test is severe enough to unreasonably interfere with
timely building of gcr & gcr4.


You keep misrepresenting the bug I reported.

The bug report was not about a flaky test in a generic sense.

And it was not about the effect of such flaky test in the official buildds 
either.

What the bug report said is that whenever I try to build the package
on several virtual machines from AWS of different types, the build fails.

ALWAYS.

So, this is not about an occasional failure. This is about a permanent failure.

Here is my build history:

Status: failed  gcr4_4.1.0-2_amd64-20230322T215308.971Z
Status: failed  gcr4_4.1.0-2_amd64-20230322T215447.211Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202053.367Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202058.856Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202121.876Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202342.316Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202518.944Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202604.013Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202834.489Z
Status: failed  gcr4_4.1.0-2_amd64-20230922T202921.776Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T214526.762Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T215238.884Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T215413.813Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T215359.115Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T215627.137Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T215937.574Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T221515.395Z
Status: failed  gcr4_4.1.0-2_amd64-20230930T221739.810Z
Status: failed  gcr4_4.1.0-2_amd64-20231130T041100.202Z
Status: failed  gcr4_4.1.0-2_amd64-20231205T160411.736Z
Status: failed  gcr4_4.1.0-2_amd64-20231213T155044.129Z
Status: failed  gcr4_4.1.0-2_amd64-20231213T160911.601Z
Status: failed  gcr4_4.1.0-2_amd64-20240129T170626.580Z
Status: failed  gcr4_4.1.0-2_amd64-20240129T170930.627Z
Status: failed  gcr4_4.1.0-2_amd64-20240129T171059.244Z
Status: failed  gcr4_4.1.0-2_amd64-20240129T171811.347Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110606.957Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110607.721Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110608.780Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110608.127Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110609.765Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110718.547Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110719.934Z
Status: failed  gcr4_4.2.0-3_amd64-20240301T110721.878Z

This is a violation of a *must* directive in Policy, because
Debian policy says this:

If build-time dependencies are specified, it must be possible to build the 
package and produce working binaries on a system with only essential and 
build-essential packages installed and also those required to satisfy the 
build-time relationships (including any implied relationships).


Debian is full of packages where the build tests and autopkgtests are
flaky enough to be disruptive to someone expecting 100% pass rates.


This is a strawman. I'm *not* expecting a 100% pass rate.

A pass rate of 100% would correspond to a failure rate of 0%.

I do not expect a failure rate of 0%.

I just expect a failure rate which is *not* 100%.


There is currently no one on the Debian GNOME team with the time to
investigate and properly fix these issues.


In such case the failing test is completely useless and even harmful.

When we have unit tests, they are enabled with the aim of
"doing something" when they fail. If we do nothing when they
fail, we are wasting the time of everybody involved.


The occasional failure is enough to remind us of this issue.


If this is really about "reminding", I can put a cron job to
email you a reminder weekly or monthly. Even a postit note in your
monitor would be better than this.

But not this.


I don't
think it's helpful to just disable the test since it may disguise a
real problem that someone can work on once they get sufficiently
annoyed by the issue.


Well, but this is *already* a real problem: The package does not
build in some systems. Not an occasional failure, but not at all.


Also, it may be important to recognize if the
failure rate for this test on official buildds increases
significantly. The failure rate did increase on s390x but we don't
really treat s390x as a supported Desktop architecture and don't have
capacity to spend much time dealing with s390x.


And you keep mentioning s390x when the original bug report did not
mention s390x at all.

In fact, 

Bug#1057562: closed by Debian FTP Masters (reply to Jeremy Bícha ) (Bug#1057562: fixed in gcr4 4.2.0-2)

2024-03-03 Thread Jeremy Bícha
Control: severity -1 important
Control: affects -1 src:gcr

On Fri, Mar 1, 2024 at 6:36 AM Santiago Vila  wrote:
> > Ignore build test failures on s390x (Closes: #1057562)
>
> This is wrong for several reasons.
>
> - The bug report did not say anything about s390x.

s390x is the only architecture where the flakiness of the gcr:gck /
object build test is severe enough to unreasonably interfere with
timely building of gcr & gcr4.

Debian is full of packages where the build tests and autopkgtests are
flaky enough to be disruptive to someone expecting 100% pass rates.
There is currently no one on the Debian GNOME team with the time to
investigate and properly fix these issues.

The occasional failure is enough to remind us of this issue. I don't
think it's helpful to just disable the test since it may disguise a
real problem that someone can work on once they get sufficiently
annoyed by the issue. Also, it may be important to recognize if the
failure rate for this test on official buildds increases
significantly. The failure rate did increase on s390x but we don't
really treat s390x as a supported Desktop architecture and don't have
capacity to spend much time dealing with s390x.

I am demoting this bug because it is a valid bug, but also not severe
enough to get gcr excluded from Debian Testing.

Thank you,
Jeremy Bícha



Bug#1057562: closed by Debian FTP Masters (reply to Jeremy Bícha ) (Bug#1057562: fixed in gcr4 4.2.0-2)

2024-03-01 Thread Santiago Vila

reopen 1057562
found 1057562 4.2.0-3
thanks


Ignore build test failures on s390x (Closes: #1057562)


This is wrong for several reasons.

- The bug report did not say anything about s390x.

- This is still happening on version 4.2.0-3. I can still reproduce
  it 100% of the time here, and I'm using amd64. Build log attached.

- I included a paragraph saying this which has been completely ignored:


If you could not reproduce the bug please contact me privately, as I

am willing to provide ssh access to a virtual machine where the bug is
fully reproducible.

- Even if you don't care about the end user being able to build
the package from source and your only concern is flakiness on the
official buildds (which is not a good way to "fix" the problem, btw),
this is still flaky on the official buildds:

https://buildd.debian.org/status/fetch.php?pkg=gcr4=amd64=4.2.0-3=1709067945=0


My offer for a machine where this happens 100% of the time still holds
if you need it.

Thanks.

gcr4_4.2.0-3_amd64-20240301T110721.878Z.gz
Description: application/gzip