Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2023-01-06 Thread Faidon Liambotis
Hi there,

On Sun, Apr 04, 2021 at 09:57:39PM +0300, Faidon Liambotis wrote:
> On Sun, Apr 04, 2021 at 02:26:16AM +0200, Samuel Thibault wrote:
> > So basically libpthread is trying to initialize itself, calls malloc,
> > which initializes jemalloc, which calls pthread_self, which is not happy
> > that libpthread is not initialized yet, thus calls assert, which tries
> > to malloc as well, which tries (again!) to initialize jemalloc, and
> > gets stuck on mutex_lock. And since this is all happening at very early
> > initialization of libc, interaction with ps etc. is not possible yet.
> > 
> > [...]
> >
> > I'm wondering how this kind of bootstrap issue is solved on Linux? The
> > _dl_allocate_tls code is exactly the same.
> 
> Thanks for looking into this! I'm really out of my depth here. Don't
> assume that the platform settings in configure.ac are the right ones
> either -- I just guesstimated them, and may just as well be something
> there.
> 
> I'd suggest reaching out to upstream directly on either a GitHub issue,
> or their Gitter channel (they're responsive in my experience). Note that
> I haven't sent them debian/patches/hurd.patch as it hasn't been
> functional so far, so it may be worth prefacing your communication with
> the configure.ac settings that we've chosen.
> 
> If you succeed into figuring out the root cause and making jemalloc
> build, happy to prepare a PR to upstream this.

I'm wondering if you've had any chance to look into this. I've just
uploaded 5.3.0-1 to unstable (after a brief stay in experimental, as
5.3.0-1~exp1), and was looking over FTBFSes and open bugs.

Thanks!
Faidon



Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2021-04-04 Thread Faidon Liambotis
On Sun, Apr 04, 2021 at 02:26:16AM +0200, Samuel Thibault wrote:
> So basically libpthread is trying to initialize itself, calls malloc,
> which initializes jemalloc, which calls pthread_self, which is not happy
> that libpthread is not initialized yet, thus calls assert, which tries
> to malloc as well, which tries (again!) to initialize jemalloc, and
> gets stuck on mutex_lock. And since this is all happening at very early
> initialization of libc, interaction with ps etc. is not possible yet.
> 
> [...]
>
> I'm wondering how this kind of bootstrap issue is solved on Linux? The
> _dl_allocate_tls code is exactly the same.

Thanks for looking into this! I'm really out of my depth here. Don't
assume that the platform settings in configure.ac are the right ones
either -- I just guesstimated them, and may just as well be something
there.

I'd suggest reaching out to upstream directly on either a GitHub issue,
or their Gitter channel (they're responsive in my experience). Note that
I haven't sent them debian/patches/hurd.patch as it hasn't been
functional so far, so it may be worth prefacing your communication with
the configure.ac settings that we've chosen.

If you succeed into figuring out the root cause and making jemalloc
build, happy to prepare a PR to upstream this.

Regards,
Faidon



Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2021-04-03 Thread Samuel Thibault
Hello,

Faidon Liambotis, le lun. 11 janv. 2021 01:33:34 +0200, a ecrit:
> I'm still unsure why either test fail the way they do¹. Like the last
> time I was debugging this bug... gdb'ing the aligned-alloc test doesn't
> work (can't interrupt execution). What's worse is that even running "ps"
> in a different console hangs while the test is running.

That's because the process is hung hard, see 
https://www.gnu.org/software/hurd/faq/ps_hangs.html
Using the -M option gets less information but doesn't hang.

> So something seems weird with the system, that's not jemalloc
> related...

It is :)

Attaching with gdb from outside, I get:

#0  0x0111e69c in mach_msg_trap () at 
./build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x0111ee46 in __GI___mach_msg (msg=0x103281c, option=3, send_size=64, 
rcv_size=32, rcv_name=51, timeout=0, notify=0) at msg.c:111
#2  0x01577612 in __gsync_wait (task=1, addr=17642300, val1=2, val2=0, msec=0, 
flags=0) at ./build-tree/hurd-i386-libc/mach/RPC_gsync_wait.c:175
#3  0x010f1923 in __pthread_mutex_lock (mtxp=0x10d333c ) at 
../sysdeps/mach/hurd/htl/pt-mutex-lock.c:36
#4  0x01086308 in malloc_mutex_lock_final (mutex=0x10d3300 ) at 
include/jemalloc/internal/mutex.h:155
#5  je_malloc_mutex_lock_slow (mutex=0x10d3300 ) at src/mutex.c:85
#6  0x0103f7bc in malloc_mutex_lock (mutex=0x10d3300 , tsdn=0x0) at 
include/jemalloc/internal/mutex.h:221
#7  malloc_init_hard () at src/jemalloc.c:1740
#8  0x01041d65 in malloc_init () at src/jemalloc.c:210
#9  imalloc_init_check (dopts=, sopts=) 
at src/jemalloc.c:2230
#10 imalloc (dopts=, sopts=) at 
src/jemalloc.c:2261
#11 je_malloc_default (size=100) at src/jemalloc.c:2290
#12 0x010423a2 in malloc (size=) at src/jemalloc.c:2389
#13 0x011af9a5 in __vasprintf_internal (result_ptr=0x1032b24, format=0x12d39a4 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
args=0x1032ae8 "\357\352,\001\357\352,\001'f\017\001\034", mode_flags=0) at 
vasprintf.c:45
#14 0x0118c367 in ___asprintf (string_ptr=0x1032b24, format=0x12d39a4 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n") at asprintf.c:31
#15 0x0116302a in __assert_fail_base (fmt=0x12d39a4 "%s%s%s:%u: %s%sAssertion 
`%s' failed.\n%n", assertion=0x10f6631 "self != NULL", file=0x10f6627 
"pt-self.c",
line=28, function=0x10f6640 <__PRETTY_FUNCTION__.1> "__pthread_self") at 
assert.c:57
#16 0x01163129 in __GI___assert_fail (assertion=0x10f6631 "self != NULL", 
file=0x10f6627 "pt-self.c", line=28,
function=0x10f6640 <__PRETTY_FUNCTION__.1> "__pthread_self") at assert.c:101
#17 0x010f12cf in __pthread_self () at pt-self.c:28
#18 __pthread_self () at pt-self.c:25
#19 0x0103f58d in malloc_init_hard_needed () at src/jemalloc.c:1455
#20 malloc_init_hard () at src/jemalloc.c:1746
#21 0x01041d65 in malloc_init () at src/jemalloc.c:210
#22 imalloc_init_check (dopts=, sopts=) 
at src/jemalloc.c:2230
#23 imalloc (dopts=, sopts=) at 
src/jemalloc.c:2261
#24 je_malloc_default (size=708) at src/jemalloc.c:2290
#25 0x010423a2 in malloc (size=) at src/jemalloc.c:2389
#26 0x010f054d in __pthread_alloc (pthread=0x1032cb0) at pt-alloc.c:125
#27 0x010f0884 in __pthread_create_internal (thread=0x1032cf8, attr=0x0, 
start_routine=0x0, arg=0x0) at pt-create.c:99
#28 0x010f4a3b in _init_routine (stack=0x0) at 
../sysdeps/mach/hurd/htl/pt-sysdep.c:73
#29 0x01154c14 in init (data=0x1032d60) at 
../sysdeps/mach/hurd/i386/init-first.c:209
#30 _dl_init_first (argc=) at 
../sysdeps/mach/hurd/i386/init-first.c:325
#31 0x220d in _dl_start_user () from /lib/ld.so

So basically libpthread is trying to initialize itself, calls malloc,
which initializes jemalloc, which calls pthread_self, which is not happy
that libpthread is not initialized yet, thus calls assert, which tries
to malloc as well, which tries (again!) to initialize jemalloc, and
gets stuck on mutex_lock. And since this is all happening at very early
initialization of libc, interaction with ps etc. is not possible yet.

I tried to make __pthread_alloc avoid using malloc, but then I got
instead

#24 je_malloc_default (size=4348) at src/jemalloc.c:2290
#25 0x010423a2 in malloc (size=) at src/jemalloc.c:2389
#26 0x00013b08 in _dl_allocate_tls_storage () at dl-tls.c:403
#27 0x00013e65 in _dl_allocate_tls (mem=0x0) at dl-tls.c:588
#28 0x010e1a0e in __pthread_create_internal (thread=0x1032cf8, attr=0x0, 
start_routine=0x0, arg=0x0) at pt-create.c:151
#29 0x010e5b1b in _init_routine (stack=0x0) at 
../sysdeps/mach/hurd/htl/pt-sysdep.c:73
#30 0x01154c14 in init (data=0x1032d60) at 
../sysdeps/mach/hurd/i386/init-first.c:209
#31 _dl_init_first (argc=) at 
../sysdeps/mach/hurd/i386/init-first.c:325
#32 0x220d in _dl_start_user () from /lib/ld.so

Thus the same issue, and changing _dl_allocate_tls is a way more
involved thing. I tried another approach by making pthread_self() return
the id of the initial thread withouth checks, but then I get a crash on

#0  __pthread_mutex_lock (mtxp=0x10ed1a0 <__pthread_key_lock>) at 
../sysdeps/mach/hurd/htl/pt-mutex-lock.c:41
#1  

Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2021-01-10 Thread Faidon Liambotis
On Sun, Dec 06, 2020 at 06:14:04PM +0100, Samuel Thibault wrote:
> Aaron M. Ucko, le ven. 11 août 2017 17:56:38 -0400, a ecrit:
> > only about a dozen source packages build-depend on
> > jemalloc.
> 
> Just as additional updated information point: inkscape is now also using
> jemalloc, so that raises the number to a couple hundreds of reverse
> build dependencies.

Well... I made another attempt with 5.2.1-2. It failed to build with the
same issue as described on this page on the "ironforge" host.

I also tried building on a qemu-kvm VM that I spinned up (started from
your image, dist-upgrading it to current sid first).

As you can see in the 5.2.1-2 source, I had to first workaround another
bug that failed that was different than this bug's. It was tsd's
test_tsd_sub_thread, which failed also due to pthread_join(), which, as
far as I could tell from my debugging, was returning before all threads
were joined(?). So I disabled that to make some progress, but it looks
like we got stuck again on the aligned-alloc one.

I'm still unsure why either test fail the way they do¹. Like the last
time I was debugging this bug... gdb'ing the aligned-alloc test doesn't
work (can't interrupt execution). What's worse is that even running "ps"
in a different console hangs while the test is running. So something
seems weird with the system, that's not jemalloc related...

(Fun fact: I even tried Googling for "hurd pthread_join", and imagine my
surprise when I found that the first result is from an open issue page
in gnu.org[1], which... starts with an IRC log from 2013 where I(!) was
asking about pthread_join related issues I had in another package...)

All in all, I think it's up to Hurd porters at this point. I've given it
my best, but I think I'm running into Hurd bugs that I'm unable to debug
myself.

Regards,
Faidon

1: 
https://www.gnu.org/software/hurd/open_issues/libpthread_cancellation_points.html



Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2020-12-06 Thread Samuel Thibault
Hello,

Aaron M. Ucko, le ven. 11 août 2017 17:56:38 -0400, a ecrit:
> only about a dozen source packages build-depend on
> jemalloc.

Just as additional updated information point: inkscape is now also using
jemalloc, so that raises the number to a couple hundreds of reverse
build dependencies.

Samuel



Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2017-08-11 Thread Aaron M. Ucko
Faidon Liambotis  writes:

> In general, jemalloc is (copying from the package description): "a
> library providing a malloc(3) implementation for multi-threaded
> processes on multi-processor systems". Given that Hurd right now doesn't
> even support SMP, I wonder if there is even much point in going deep
> down that rabbithole in an attempt to fix all of those issues.

That's fair, particularly given that the Hurd is not a release
architecture and only about a dozen source packages build-depend on
jemalloc.  Thanks anyway for looking into it!  (Sorry I can't suggest
patches myself; I'm not deeply familiar with the Hurd, just like to call
attention to portability issues in general.)

-- 
Aaron M. Ucko, KB1CJC (amu at alum.mit.edu, ucko at debian.org)
http://www.mit.edu/~amu/ | http://stuff.mit.edu/cgi/finger/?a...@monk.mit.edu



Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2017-08-11 Thread Faidon Liambotis
severity 871446 wishlist
thanks

Hi Aaron,

On Mon, Aug 07, 2017 at 10:25:13PM -0400, Aaron M. Ucko wrote:
> Thanks for taking care of #828871!  jemalloc now compiles on
> hurd-i386, but the build still ultimately fails because the
> aligned_alloc test hangs (hard):

I spent a lot of time at DebCamp trying to debug this on exodar before
uploading and it was failing with the same symptom. The test hanged to
the point where neither Ctrl+C nor Ctrl+\ worked, while the session
remained open. I tried logging in from another terminal, ran "ps" and
that kept hanging as well(!), with no output.

In the end, I thought it was some odd kernel behavior of the porterbox
and uploaded the package, in the hope that it would work on the buildd.
Turns out that it didn't :(

I'm pretty clueless about Hurd and over my head, so I'm afraid there's
not much more I can do, especially when my debugging is hindered by the
fact that not even most basic tools (like ps) aren't working...

Note that I'm staging 5.0.1 for experimental and tried to build that on
Hurd too, but that FTBFSes for another reason: Hurd is missing
pthread_getaffinity_np. I tried replacing that by sched_getaffinity (a
noop on Hurd, apparently) but that build failed the test suite, for a
seemingly different reason.

In general, jemalloc is (copying from the package description): "a
library providing a malloc(3) implementation for multi-threaded
processes on multi-processor systems". Given that Hurd right now doesn't
even support SMP, I wonder if there is even much point in going deep
down that rabbithole in an attempt to fix all of those issues.

I certainly don't have the knowledge or the bandwidth to do so, but if
you folks disagree and manage to make it work, I wouldn't mind carrying
(or even forwarding upstream) such patches.

Regards,
Faidon



Bug#871446: jemalloc: FTBFS on hurd-i386: aligned_alloc test hangs

2017-08-07 Thread Aaron M. Ucko
Source: jemalloc
Version: 3.6.0-10
Severity: important
Justification: fails to build from source

Thanks for taking care of #828871!  jemalloc now compiles on
hurd-i386, but the build still ultimately fails because the
aligned_alloc test hangs (hard):

  === test/integration/aligned_alloc ===
  
  Session terminated, terminating shell...Killed
  Test harness error
  Makefile:344: recipe for target 'check' failed
  make[1]: *** [check] Error 1
  make[1]: Leaving directory '/<>'
  dh_auto_test: make -j1 check VERBOSE=1 returned exit code 2
  debian/rules:46: recipe for target 'build-arch' failed
  make: *** [build-arch] Error 2
  dpkg-buildpackage: error: debian/rules build-arch gave error exit status 2
  E: Build killed with signal TERM after 180 minutes of inactivity
  E: Build killed with signal KILL after 5 minutes of inactivity
  E: Build killed with signal KILL after 10 minutes of inactivity

On a possibly related note, I see some warnings regarding that test,
though jemalloc should still fail gracefully on excessive values.

  test/integration/aligned_alloc.c: In function 'test_oom_errors':
  test/integration/aligned_alloc.c:43:4: warning: argument 2 value '2147483648' 
exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
p = aligned_alloc(alignment, size);
~~^~~~
  In file included from test/include/test/jemalloc_test.h:1:0,
   from test/integration/aligned_alloc.c:1:
  /usr/include/stdlib.h:470:14: note: in a call to allocation function 
'aligned_alloc' declared here
   extern void *aligned_alloc (size_t __alignment, size_t __size)
^
  test/integration/aligned_alloc.c:56:4: warning: argument 2 value '3221225473' 
exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
p = aligned_alloc(alignment, size);
~~^~~~
  In file included from test/include/test/jemalloc_test.h:1:0,
   from test/integration/aligned_alloc.c:1:
  /usr/include/stdlib.h:470:14: note: in a call to allocation function 
'aligned_alloc' declared here
   extern void *aligned_alloc (size_t __alignment, size_t __size)
^
  test/integration/aligned_alloc.c:68:4: warning: argument 2 value '4294967280' 
exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
p = aligned_alloc(alignment, size);
~~^~~~
  In file included from test/include/test/jemalloc_test.h:1:0,
   from test/integration/aligned_alloc.c:1:
  r/usr/include/stdlib.h:470:14: note: in a call to allocation function 
'aligned_alloc' declared here
   extern void *aligned_alloc (size_t __alignment, size_t __size)
^

Could you please take a look?  You can find the full log at
https://buildd.debian.org/status/fetch.php?pkg=jemalloc=hurd-i386=3.6.0-10=1502088411=0

Thanks!

-- 
Aaron M. Ucko, KB1CJC (amu at alum.mit.edu, ucko at debian.org)
http://www.mit.edu/~amu/ | http://stuff.mit.edu/cgi/finger/?a...@monk.mit.edu