Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
[I listed my /usr/src svn veriosn information instead of /usr/ports . Correcting. . .] On 2018-Dec-31, at 12:05, Mark Millard wrote: > On 2018-Dec-31, at 10:16, Jonathan Chen wrote: > >> On Mon, 31 Dec 2018 at 21:05, Mark Millard wrote: >> [...] >>> But if you have a form of hang-up that shows no sign of being tied >>> to kevent or hangs-up only sometimes, I'd be surprised if the __packed >>> change(s) would fix the issue. >> >> With the __packed-modified qemu-user-static, the amd64->armv7 >> crossbuilds does not hang anymore, but I get build failures instead. >> Interestingly enough, an unmodified qemu-user-static gets further >> along in a amd64->armv6 crossbuild, with only one reproducible hang. > > I tend to compare cross-build failures to native-build attempts. The > multimedia-gstreamer1-qt@qt5 hang-up was qemu-arm-static specific, > not occurring native. That and being reliable about hanging-up is > what prompted the investigation. > > The lld thread fanout hangup also has only happened under > qemu-arm-static but I do not have a context with more than 4 cores for > armv7: far less than 28 (FreeBSD under Hyper-V) or 32 cpus (FreeBSD > native) that I use for cross-builds. > > I do not know if you care to but it is possible to see if the FreeBSD > package builders get failures or hangs for the same ports. I use > head port build examples below: > > http://beefy16.nyi.freebsd.org/jail.html?mastername=head-armv7-default > > http://beefy8.nyi.freebsd.org/jail.html?mastername=head-armv6-default > > The pages displayed show a list of port version (p??) and freebsd > version (s??) looking like p??_s?? . Those links take you > to pages for exploring the built, failed, skipped, and ignored > ports. > > Of course, for race-condition problems in builds, checking is messier > because of needing to look at possibly many port/system combinations. > > My attempts to build x11/lumina fail for: > > [00:01:02] [01] [00:00:00] Building multimedia/libvpx | libvpx-1.7.0_2 > [00:02:23] [01] [00:01:21] Saved multimedia/libvpx | libvpx-1.7.0_2 wrkdir > to: > /usr/local/poudriere/data/wrkdirs/FBSDFSSDjailArmV7-default/default/libvpx-1.7.0_2.tar > [00:02:23] [01] [00:01:21] Finished multimedia/libvpx | libvpx-1.7.0_2: > Failed: build > [00:02:24] [01] [00:01:22] Skipping multimedia/ffmpeg | ffmpeg-4.1,1: > Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed > [00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-libav | > gstreamer1-libav-1.14.4_2: Dependent port multimedia/libvpx | libvpx-1.7.0_2 > failed > [00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-plugins-core | > gstreamer1-plugins-core-1.14: Dependent port multimedia/libvpx | > libvpx-1.7.0_2 failed > [00:02:24] [01] [00:01:22] Skipping x11/lumina | lumina-1.4.1,3: Dependent > port multimedia/libvpx | libvpx-1.7.0_2 failed > [00:02:24] [01] [00:01:22] Skipping x11/lumina-core | lumina-core-1.4.1: > Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed > . . . > [00:06:19] Failed ports: multimedia/libvpx:build > [00:06:19] Skipped ports: multimedia/ffmpeg multimedia/gstreamer1-libav > multimedia/gstreamer1-plugins-core x11/lumina x11/lumina-core > [FBSDFSSDjailArmV7-default] [2018-12-30_17h04m02s] [committing:] Queued: 7 > Built: 1 Failed: 1 Skipped: 5 Ignored: 0 Tobuild: 0 Time: 00:06:16 > > Native build attempts on an armv7 get the same. > > But I'm still at: > > . . . Correcting to have the /usr/ports information: # svnlite info /usr/ports/ | grep "Re[plv]" Relative URL: ^/head Repository Root: svn://svn.freebsd.org/ports Repository UUID: 35697150-7ecd-e111-bb59-0022644237b5 Revision: 484783 Last Changed Rev: 484783 > > because I froze at that while investigating the reliable hang and > have not started progressing again yet. Last I looked the > head-armv7-default package builds were also failing for libvpx if > I remember right. Looks like more recently libvpx builds on the package builders. So next time that I update the ports tree I'll get to see the next problem (if any). === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
On 2018-Dec-31, at 10:16, Jonathan Chen wrote: > On Mon, 31 Dec 2018 at 21:05, Mark Millard wrote: > [...] >> But if you have a form of hang-up that shows no sign of being tied >> to kevent or hangs-up only sometimes, I'd be surprised if the __packed >> change(s) would fix the issue. > > With the __packed-modified qemu-user-static, the amd64->armv7 > crossbuilds does not hang anymore, but I get build failures instead. > Interestingly enough, an unmodified qemu-user-static gets further > along in a amd64->armv6 crossbuild, with only one reproducible hang. I tend to compare cross-build failures to native-build attempts. The multimedia-gstreamer1-qt@qt5 hang-up was qemu-arm-static specific, not occurring native. That and being reliable about hanging-up is what prompted the investigation. The lld thread fanout hangup also has only happened under qemu-arm-static but I do not have a context with more than 4 cores for armv7: far less than 28 (FreeBSD under Hyper-V) or 32 cpus (FreeBSD native) that I use for cross-builds. I do not know if you care to but it is possible to see if the FreeBSD package builders get failures or hangs for the same ports. I use head port build examples below: http://beefy16.nyi.freebsd.org/jail.html?mastername=head-armv7-default http://beefy8.nyi.freebsd.org/jail.html?mastername=head-armv6-default The pages displayed show a list of port version (p??) and freebsd version (s??) looking like p??_s?? . Those links take you to pages for exploring the built, failed, skipped, and ignored ports. Of course, for race-condition problems in builds, checking is messier because of needing to look at possibly many port/system combinations. My attempts to build x11/lumina fail for: [00:01:02] [01] [00:00:00] Building multimedia/libvpx | libvpx-1.7.0_2 [00:02:23] [01] [00:01:21] Saved multimedia/libvpx | libvpx-1.7.0_2 wrkdir to: /usr/local/poudriere/data/wrkdirs/FBSDFSSDjailArmV7-default/default/libvpx-1.7.0_2.tar [00:02:23] [01] [00:01:21] Finished multimedia/libvpx | libvpx-1.7.0_2: Failed: build [00:02:24] [01] [00:01:22] Skipping multimedia/ffmpeg | ffmpeg-4.1,1: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed [00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-libav | gstreamer1-libav-1.14.4_2: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed [00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-plugins-core | gstreamer1-plugins-core-1.14: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed [00:02:24] [01] [00:01:22] Skipping x11/lumina | lumina-1.4.1,3: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed [00:02:24] [01] [00:01:22] Skipping x11/lumina-core | lumina-core-1.4.1: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed . . . [00:06:19] Failed ports: multimedia/libvpx:build [00:06:19] Skipped ports: multimedia/ffmpeg multimedia/gstreamer1-libav multimedia/gstreamer1-plugins-core x11/lumina x11/lumina-core [FBSDFSSDjailArmV7-default] [2018-12-30_17h04m02s] [committing:] Queued: 7 Built: 1 Failed: 1 Skipped: 5 Ignored: 0 Tobuild: 0 Time: 00:06:16 Native build attempts on an armv7 get the same. But I'm still at: # svnlite info | grep "Re[plv]" Relative URL: ^/head Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 341836 Last Changed Rev: 341836 because I froze at that while investigating the reliable hang and have not started progressing again yet. Last I looked the head-armv7-default package builds were also failing for libvpx if I remember right. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
On Mon, 31 Dec 2018 at 21:05, Mark Millard wrote: [...] > But if you have a form of hang-up that shows no sign of being tied > to kevent or hangs-up only sometimes, I'd be surprised if the __packed > change(s) would fix the issue. With the __packed-modified qemu-user-static, the amd64->armv7 crossbuilds does not hang anymore, but I get build failures instead. Interestingly enough, an unmodified qemu-user-static gets further along in a amd64->armv6 crossbuild, with only one reproducible hang. Cheers. -- Jonathan Chen ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
On 2018-Dec-30, at 21:01, Jonathan Chen wrote: > On Mon, 31 Dec 2018 at 14:34, Mark Millard via freebsd-ports > wrote: >> >> [Removing __packed did make the size and offsets match armv7 >> and the build worked based on the reconstructed qemu-arm-static.] > > Thanks for the analysis Mark! I've been suffering quite a few hangups > with my ports crossbuilds on amd64->armv7 on 12-STABLE, and I'll be > trying your suggestions to see whether it resolves the issue. If you have something like a kqread state for a hang-up consistently in the same place, then Mikael Urankar 's fix (or any other way of getting the right sizes and field offsets for kevent) has a chance of fixing what you have observed. But if you have a form of hang-up that shows no sign of being tied to kevent or hangs-up only sometimes, I'd be surprised if the __packed change(s) would fix the issue. I've seen such racy hang-ups from lld's creation of (#cpu)+2 threads, as FreeBSD counts cpus. I've selectively forced -Wl,--no-threads at times in specific contexts to avoid that. binutils ld does not tolerate the option. ports does not appear to have an equivalent of: LDFLAGS.lld+= -Wl,--no-threads that would be lld specific. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
On Mon, 31 Dec 2018 at 14:34, Mark Millard via freebsd-ports wrote: > > [Removing __packed did make the size and offsets match armv7 > and the build worked based on the reconstructed qemu-arm-static.] Thanks for the analysis Mark! I've been suffering quite a few hangups with my ports crossbuilds on amd64->armv7 on 12-STABLE, and I'll be trying your suggestions to see whether it resolves the issue. Cheers. -- Jonathan Chen ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
[Removing __packed did make the size and offsets match armv7 and the build worked based on the reconstructed qemu-arm-static.] On 2018-Dec-30, at 16:38, Mark Millard wrote: > On 2018-Dec-28, at 12:12, Mark Millard wrote: > >> On 2018-Dec-28, at 05:13, Michal Meloun wrote: >> >>> Mark, >>> this is known problem with qemu-user-static. >>> Emulation of every single interruptible syscall is broken by design (it >>> have signal related races). Theses races cannot be solved without major >>> rewrite of syscall emulation code. >>> Unfortunately, nobody actively works on this, I think. >>> >> >> Thanks for the note setting some expectations. >> . . . > > > It turns out that I've been through (part of?) this before and > mikael.uran...@gmail.com had back then provided a qemu-user-static > patch (that might have been arm specific or 32-bit target specific > when running on a 64-bit host). (The qemu-user-static code structure > seems to have changed some afterwards and the patch is no longer > where he had pointed me to back then.) > > To show size and offsets on armv7 vs. armd64 for struct kevent > I use: > > # more kevent_size_offsets.c > #include "/usr/include/sys/event.h" // kevent > #include // offsetof > #include // printf > > int > main() > { >printf("%lu\n", (unsigned long) sizeof(struct kevent)); >printf("ident %lu\n", (unsigned long) offsetof(struct kevent, ident)); >printf("filter %lu\n", (unsigned long) offsetof(struct kevent, > filter)); >printf("flags %lu\n", (unsigned long) offsetof(struct kevent, flags)); >printf("fflags %lu\n", (unsigned long) offsetof(struct kevent, > fflags)); >printf("data %lu\n", (unsigned long) offsetof(struct kevent, data)); >printf("udata %lu\n", (unsigned long) offsetof(struct kevent, udata)); >printf("ext %lu\n", (unsigned long) offsetof(struct kevent, ext)); >return 0; > } > > It ends up showing on armv7 (under qemu-arm-static insteead of native, not > that it matters here): > > # ./a.out > 64 > ident 0 > filter 4 > flags 6 > fflags 8 > data 16 > udata 24 > ext 32 > > On amd64 (native) it ends up as: > > # ./a.out > 64 > ident 0 > filter 8 > flags 10 > fflags 12 > data 16 > udata 24 > ext 32 > > Thus a translation of layout is required when hosted. This is for: > > struct kevent { >__uintptr_t ident; /* identifier for this event */ >short filter; /* filter for event */ >unsigned short flags; /* action flags for kqueue */ >unsigned intfflags; /* filter flag value */ >__int64_t data; /* filter data value */ >void*udata; /* opaque user data identifier */ >__uint64_t ext[4]; /* extensions */ > }; > > But qemu-user-static has for translation purposes: > > struct target_freebsd_kevent { >abi_ulong ident; >int16_tfilter; >uint16_t flags; >uint32_t fflags; >int64_t data; >abi_ulong udata; >uint64_t ext[4]; > } __packed; > > (note the __packed) for which in amd64's qemu_arm_static has > the size and offsets: > > # gdb qemu-arm-static > . . . > (gdb) p/d sizeof(struct target_freebsd_kevent) > $1 = 56 > (gdb) p/d &((struct target_freebsd_kevent *)0)->ident > $2 = 0 > (gdb) p/d &((struct target_freebsd_kevent *)0)->filter > $3 = 4 > (gdb) p/d &((struct target_freebsd_kevent *)0)->flags > $4 = 6 > (gdb) p/d &((struct target_freebsd_kevent *)0)->fflags > $5 = 8 > (gdb) p/d &((struct target_freebsd_kevent *)0)->data > $6 = 12 > (gdb) p/d &((struct target_freebsd_kevent *)0)->udata > $7 = 20 > (gdb) p/d &((struct target_freebsd_kevent *)0)->ext > $8 = 24 > > which which does not match the armv7 offsets for > data, udata, or ext and does not have the right size > for struct target_freebsd_kevent[] indexing to > match armv7's struct target_freebsd_kevent[] indexing. > > This in turn makes the do_freebsd_kevent code do the wrong > thing in its: > >struct target_freebsd_kevent *target_changelist, *target_eventlist; > . . . >for (i = 0; i < arg3; i++) { >__get_user(changelist[i].ident, _changelist[i].ident); >__get_user(changelist[i].filter, _changelist[i].filter); >__get_user(changelist[i].flags, _changelist[i].flags); >__get_user(changelist[i].fflags, _changelist[i].fflags); >__get_user(changelist[i].data, _changelist[i].data); >/* __get_user(changelist[i].udata, _changelist[i].udata); */ > #if TARGET_ABI_BITS == 32 >changelist[i].udata = (void > *)(uintptr_t)target_changelist[i].udata; >tswap32s((uint32_t *)[i].udata); > #else >changelist[i].udata = (void > *)(uintptr_t)target_changelist[i].udata; >tswap64s((uint64_t *)[i].udata); > #endif >__get_user(changelist[i].ext[0], _changelist[i].ext[0]); >__get_user(changelist[i].ext[1],
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]
On 2018-Dec-28, at 12:12, Mark Millard wrote: > On 2018-Dec-28, at 05:13, Michal Meloun wrote: > >> Mark, >> this is known problem with qemu-user-static. >> Emulation of every single interruptible syscall is broken by design (it >> have signal related races). Theses races cannot be solved without major >> rewrite of syscall emulation code. >> Unfortunately, nobody actively works on this, I think. >> > > Thanks for the note setting some expectations. > . . . It turns out that I've been through (part of?) this before and mikael.uran...@gmail.com had back then provided a qemu-user-static patch (that might have been arm specific or 32-bit target specific when running on a 64-bit host). (The qemu-user-static code structure seems to have changed some afterwards and the patch is no longer where he had pointed me to back then.) To show size and offsets on armv7 vs. armd64 for struct kevent I use: # more kevent_size_offsets.c #include "/usr/include/sys/event.h" // kevent #include // offsetof #include // printf int main() { printf("%lu\n", (unsigned long) sizeof(struct kevent)); printf("ident %lu\n", (unsigned long) offsetof(struct kevent, ident)); printf("filter %lu\n", (unsigned long) offsetof(struct kevent, filter)); printf("flags %lu\n", (unsigned long) offsetof(struct kevent, flags)); printf("fflags %lu\n", (unsigned long) offsetof(struct kevent, fflags)); printf("data %lu\n", (unsigned long) offsetof(struct kevent, data)); printf("udata %lu\n", (unsigned long) offsetof(struct kevent, udata)); printf("ext %lu\n", (unsigned long) offsetof(struct kevent, ext)); return 0; } It ends up showing on armv7 (under qemu-arm-static insteead of native, not that it matters here): # ./a.out 64 ident 0 filter 4 flags 6 fflags 8 data 16 udata 24 ext 32 On amd64 (native) it ends up as: # ./a.out 64 ident 0 filter 8 flags 10 fflags 12 data 16 udata 24 ext 32 Thus a translation of layout is required when hosted. This is for: struct kevent { __uintptr_t ident; /* identifier for this event */ short filter; /* filter for event */ unsigned short flags; /* action flags for kqueue */ unsigned intfflags; /* filter flag value */ __int64_t data; /* filter data value */ void*udata; /* opaque user data identifier */ __uint64_t ext[4]; /* extensions */ }; But qemu-user-static has for translation purposes: struct target_freebsd_kevent { abi_ulong ident; int16_tfilter; uint16_t flags; uint32_t fflags; int64_t data; abi_ulong udata; uint64_t ext[4]; } __packed; (note the __packed) for which in amd64's qemu_arm_static has the size and offsets: # gdb qemu-arm-static . . . (gdb) p/d sizeof(struct target_freebsd_kevent) $1 = 56 (gdb) p/d &((struct target_freebsd_kevent *)0)->ident $2 = 0 (gdb) p/d &((struct target_freebsd_kevent *)0)->filter $3 = 4 (gdb) p/d &((struct target_freebsd_kevent *)0)->flags $4 = 6 (gdb) p/d &((struct target_freebsd_kevent *)0)->fflags $5 = 8 (gdb) p/d &((struct target_freebsd_kevent *)0)->data $6 = 12 (gdb) p/d &((struct target_freebsd_kevent *)0)->udata $7 = 20 (gdb) p/d &((struct target_freebsd_kevent *)0)->ext $8 = 24 which which does not match the armv7 offsets for data, udata, or ext and does not have the right size for struct target_freebsd_kevent[] indexing to match armv7's struct target_freebsd_kevent[] indexing. This in turn makes the do_freebsd_kevent code do the wrong thing in its: struct target_freebsd_kevent *target_changelist, *target_eventlist; . . . for (i = 0; i < arg3; i++) { __get_user(changelist[i].ident, _changelist[i].ident); __get_user(changelist[i].filter, _changelist[i].filter); __get_user(changelist[i].flags, _changelist[i].flags); __get_user(changelist[i].fflags, _changelist[i].fflags); __get_user(changelist[i].data, _changelist[i].data); /* __get_user(changelist[i].udata, _changelist[i].udata); */ #if TARGET_ABI_BITS == 32 changelist[i].udata = (void *)(uintptr_t)target_changelist[i].udata; tswap32s((uint32_t *)[i].udata); #else changelist[i].udata = (void *)(uintptr_t)target_changelist[i].udata; tswap64s((uint64_t *)[i].udata); #endif __get_user(changelist[i].ext[0], _changelist[i].ext[0]); __get_user(changelist[i].ext[1], _changelist[i].ext[1]); __get_user(changelist[i].ext[2], _changelist[i].ext[2]); __get_user(changelist[i].ext[3], _changelist[i].ext[3]); } . . . for (i = 0; i < arg5; i++) { __put_user(eventlist[i].ident, _eventlist[i].ident); __put_user(eventlist[i].filter, _eventlist[i].filter); __put_user(eventlist[i].flags, _eventlist[i].flags);
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
On 2018-Dec-28, at 12:12, Mark Millard wrote: > On 2018-Dec-28, at 05:13, Michal Meloun wrote: > >> Mark, >> this is known problem with qemu-user-static. >> Emulation of every single interruptible syscall is broken by design (it >> have signal related races). Theses races cannot be solved without major >> rewrite of syscall emulation code. >> Unfortunately, nobody actively works on this, I think. >> > > Thanks for the note setting some expectations. > > On the evidence that I have I expect that more is going on than that: > > A) The hang-up always happens and always in the same place. So > it would appear that no race is involved. > > B) (A) is true even for varying the number of builders in parallel > (so other builds also happening) and the number of jobs allowed per > builder. It also fails for only one builder allowed only one process. > (I get traces from that last kind of context.) > > C) The problem started on the package-building servers for armv7 > and armv6 without qemu-user-static having an update (FreeBSD and > cmake had updates, for example). > > D) The problem is only observed for targeting armv7 and armv6 as > far as I can tell. I've never seen it for aarch64, neither my > own builds nor when I looked at the package-building server > history. > > At least that is what got me started. (I've since learned that > qemu-user-static uses fork in place of a requested vfork.) > > My ktrace/kdump experiment yesterday showed something odd for the > kevent that hangs in cmake: > > 93172 qemu-arm-static CALL > kevent(0x3,0x7ffe7d40,0x2,0x7ffd7d40,0x400,0) > 93172 qemu-arm-static STRU struct kevent[] = { { ident=6, > filter=EVFILT_READ, flags=0x1, fflags=0, data=0, udata=0x0 } > { ident=0x0, filter=, flags=0, fflags=0x8, > data=0x1, udata=0x0 } } > > Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct > kevent[]. The kevent use is from cmake. > > So far I've not identified a signal being delivered at a time that would seem > to me to be likely to contribute. (But this is not familiar code so my > judgment > is likely not the best.) > > Note: I normally run FreeBSD using a non-debug kernel, even when using > head. (The kernel does have symbols.) The detail of the signal usage involved leading up to the hang-up, starting from just before the "press return" for the "make FLAVOR=qt5" command that I had entered: The only "Interrupted system call" prior to my killing the hung cmake process was (kdump -H -r -S output): 93172 100717 qemu-arm-static CALL execve[59](0x10392,0x8605051a0,0x860cf5400) 93172 101706 qemu-arm-static RET nanosleep[240] -1 errno 4 Interrupted system call 93172 100717 qemu-arm-static NAMI "/bin/sh" 93172 100717 sh RET execve[59] JUSTRETURN 93172 100717 sh CALL readlink[58](0x207a65,0x7fffccc0,0x400) This is where ninja (via qemu-arm-static) execve's the amd64-native /bin/sh (to in turn later run cmake via qemu-arm-static). (This was after the fork [for the requested vfork].) So it is for the close-down of the thread that was in nanosleep. There were no PSIG's and no sigreturn's prior to the kill according to the kdump output. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
[Using ktrace/kdump shows an apperent oddity in the kevent use that hang-up in cmake, not that I know it causes the hang-up.] On 2018-Dec-28, at 00:16, Mark Millard wrote: > [The historical notes are removed and replaced by partial trace > information from example hang-ups, not that I've figured out > what contributes yet.] > > I ran into the following while trying to get evidence > about the hang-up for an amd64->armv7 cross-build of > multimedia/gstreamer1-qt@qt5 . > > The following from trying to get evidence for the hang-up > via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5” > in a poudriere bulk -i’s interactive mode for the context > that has the hang-up in normal poudriere-devel runs. > > > From top after the hang-up (to identify some context): > > 14528 root 2 520 100M24M0 kqread 11 0:00 0.00% > /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. > 14527 root 2 52088M13M0 select 22 0:00 0.00% > /usr/local/bin/qemu-arm-static ninja -j1 -v all > > from ps -auxd as well (to identify more context): > > root 101140.0 0.0 10328 1756 1 I+J 13:47 0:00.01 | >`-- make FLAVOR=qt5 > root 145260.0 0.0 10204 1792 1 I+J 13:50 0:00.00 | > `-- /bin/sh -e -c (cd > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! > /usr/bin/env QT_SELE > root 145270.0 0.0 90304 13084 1 I+J 13:50 0:00.09 | >`-- /usr/local/bin/qemu-arm-static ninja -j1 -v all > root 145280.0 0.0 102876 25060 1 IJ 13:50 0:00.12 | > `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E > cmake_autogen /wrkdirs/usr/ports/multimedia/g > > I had made a qemu-user-static that enabled do_strace when > it is used to run cmake or ninja. > > The only do_strace lines from qemu-arm-static running cmake > or ninja mentioning process 14528 are included in the sequence: > > (Before the below was a long list of "14527 fstatat” lines. > I’ll note that "'Unknown syscall 545” is from ppoll use.) > > 82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0 > 82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0 > 82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0 > 82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0 > 82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0 > 82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0 > 82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0 > 82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0 > 82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", > 0x9fffe200, 0) = 0 > 82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0 > 82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0 > 82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0 > 82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401 > 82400 close(6) = 0 > = 0 > 82400 Unknown syscall 545 > 82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0 > 82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0 > 82401 close(5) = 0 > 82401 open("/dev/null",0,0) = 5 > 82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0 > 82401 close(5) = 0 > 82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0 > 82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1 > 82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0 > 82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 > sigpending(-1610620072,1,0,-191968524,0,0) = 0 > > The vfork then close(6) sequence for 82400 vs. the later > use of 6 in dup2 in 82401 may be rather odd. But it looks > like qemu-*-static uses do_freebsd_fork to implement > do_freebsd_vfork, despite reporting vfork before > calling do_freebsd_vfork. (Does the close(6) appear to > indicate a race for native operation of ninja for the > period when the address space is shared?) > > Ninja has Subprocess::Start code that has: > > #ifdef POSIX_SPAWN_USEVFORK > flags |= POSIX_SPAWN_USEVFORK; > #endif > > > if (posix_spawnattr_setflags(, flags) != 0) >Fatal("posix_spawnattr_setflags: %s", strerror(errno)); > > const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL }; > if (posix_spawn(_, "/bin/sh", , , > const_cast(spawned_args), environ) != 0) >Fatal("posix_spawn: %s", strerror(errno)); > > that is in use here. I think that this explains the vfork use. > > > It turns out that putting the hung-up build in the background > and then killing 82401 with the likes of kill -6 leads to more > output that had apparently been buffered. It shows the use of > the (amd64 native) /bin/sh that in turn leads to > /usr/local/bin/cmake via qemu-arm-static. /bin/sh, being > native, gets no do_strace output from qemu-arm-static. > > 82400
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
[I built a FreeBSD head -r340288 context and tried ports head -r484783 and the problem repeated.] On 2018-Dec-22, at 12:55, Mark Millard wrote: > [I found my E-mail records reporting successful builds using > qemu-user-static from ports head -r484783 under FreeBSD > head -r340287.] > > On 2018-Dec-22, at 00:10, Mark Millard wrote: > >> [I messed up the freebsd-emulation email address the first time I sent >> this. I also forgot to indicate the qemu-user-static vintage relationship.] >> >> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} >> port cross >> builds in another message sequence. But it turns out that one thing I ran >> into >> has hung-up every time, the same way, for amd64->armv7 cross builds: >> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a >> separate report >> with some updated notes. >> >> A little context: I had built from ports head -r484783 before under FreeBSD >> head >> -r340287 (as I remember the version). Back then it did not have this problem >> that it >> now has under FreeBSD head -r341836 . One ports-specific change was to force >> perl5.28 >> as the default instead of perl5.26 originally. In fact this is what drives >> what is >> being rebuilt for my experiment that caught this. But I doubt the perl >> version is >> important to the problem. The context has a Ryzen Threadripper 1950X and has >> been >> tested both for FreeBSD under Hyper-V and for the same media native-booted. >> Both >> hang-up at the same point as seen via ps or top. The native tools for >> cross-build >> speedup were in use. Cross-builds targeting aarch64 did not get this problem >> but >> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the >> first >> armv7 try. >> >> ADDED: The qemu-user-static back with head -r340287 before installing the >> updated ports would likely be different than the -r484783 vintage. So both >> FreeBSD and qemu-user-static may have changed over the comparison. > > CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds > based on qemu-user-static from ports head -484783 --all built under FreeBSD > head -r340287 . So the use of the perl5.28 as the forced-default and the > newer FreeBSD head version -r341836 as the context are the differences here. > >> The hang-up: >> >> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up >> and timed >> out. Looking during the wait in later tries shows something much like (from >> one of the >> examples): >> >> root 337190.0 0.0 12920 3528 0 I11:40 0:00.03 | | >> `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg >> (gstreamer1-qt5-1.2.0_14) (sh) >> root 415510.0 0.0 12920 3520 0 I11:43 0:00.00 | | >> `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg >> (gstreamer1-qt5-1.2.0_14) (sh) >> root 415520.0 0.0 10340 1744 0 IJ 11:43 0:00.01 | | >> `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt >> FLAVOR=qt5 build >> root 415660.0 0.0 10236 1796 0 IJ 11:43 0:00.00 | | >> `-- /bin/sh -e -c (cd >> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! >> /usr/bin/env QT_SELE >> root 415670.0 0.0 89976 12896 0 IJ 11:43 0:00.07 | | >> `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all >> root 415850.0 0.0 102848 25056 0 IJ 11:43 0:00.10 | | >> |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E >> cmake_autogen /wrkdirs/usr/ports/multimedia/g >> root 415860.0 0.0 102852 25072 0 IJ 11:43 0:00.11 | | >> `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E >> cmake_autogen /wrkdirs/usr/ports/multimedia/g >> >> or as top showed it: >> >> 41552 root 1 52010M 1744K0 wait15 0:00 0.00% >> /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build >> 41566 root 1 52010M 1796K0 wait 1 0:00 0.00% >> /bin/sh -e -c (cd >> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! >> /usr/bin/env QT_SELECT=qt5 QMAKEMODULES >> 41567 root 2 52088M13M0 select 4 0:00 0.00% >> /usr/local/bin/qemu-arm-static ninja -j28 -v all >> 41585 root 2 520 100M24M0 kqread 8 0:00 0.00% >> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen >> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. >> 41586 root 2 520 100M24M0 kqread 22 0:00 0.00% >> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen >> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. >> >> So: waiting in kqread trying to run cmake. >> >> Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not >> resume the
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
[I found my E-mail records reporting successful builds using qemu-user-static from ports head -r484783 under FreeBSD head -r340287.] On 2018-Dec-22, at 00:10, Mark Millard wrote: > [I messed up the freebsd-emulation email address the first time I sent > this. I also forgot to indicate the qemu-user-static vintage relationship.] > > I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port > cross > builds in another message sequence. But it turns out that one thing I ran into > has hung-up every time, the same way, for amd64->armv7 cross builds: > multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate > report > with some updated notes. > > A little context: I had built from ports head -r484783 before under FreeBSD > head > -r340287 (as I remember the version). Back then it did not have this problem > that it > now has under FreeBSD head -r341836 . One ports-specific change was to force > perl5.28 > as the default instead of perl5.26 originally. In fact this is what drives > what is > being rebuilt for my experiment that caught this. But I doubt the perl > version is > important to the problem. The context has a Ryzen Threadripper 1950X and has > been > tested both for FreeBSD under Hyper-V and for the same media native-booted. > Both > hang-up at the same point as seen via ps or top. The native tools for > cross-build > speedup were in use. Cross-builds targeting aarch64 did not get this problem > but > targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the > first > armv7 try. > > ADDED: The qemu-user-static back with head -r340287 before installing the > updated ports would likely be different than the -r484783 vintage. So both > FreeBSD and qemu-user-static may have changed over the comparison. CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds based on qemu-user-static from ports head -484783 --all built under FreeBSD head -r340287 . So the use of the perl5.28 as the forced-default and the newer FreeBSD head version -r341836 as the context are the differences here. > The hang-up: > > In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up > and timed > out. Looking during the wait in later tries shows something much like (from > one of the > examples): > > root 337190.0 0.0 12920 3528 0 I11:40 0:00.03 | | >`-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg > (gstreamer1-qt5-1.2.0_14) (sh) > root 415510.0 0.0 12920 3520 0 I11:43 0:00.00 | | > `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg > (gstreamer1-qt5-1.2.0_14) (sh) > root 415520.0 0.0 10340 1744 0 IJ 11:43 0:00.01 | | >`-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt > FLAVOR=qt5 build > root 415660.0 0.0 10236 1796 0 IJ 11:43 0:00.00 | | > `-- /bin/sh -e -c (cd > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! > /usr/bin/env QT_SELE > root 415670.0 0.0 89976 12896 0 IJ 11:43 0:00.07 | | >`-- /usr/local/bin/qemu-arm-static ninja -j28 -v all > root 415850.0 0.0 102848 25056 0 IJ 11:43 0:00.10 | | > |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E > cmake_autogen /wrkdirs/usr/ports/multimedia/g > root 415860.0 0.0 102852 25072 0 IJ 11:43 0:00.11 | | > `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E > cmake_autogen /wrkdirs/usr/ports/multimedia/g > > or as top showed it: > > 41552 root 1 52010M 1744K0 wait15 0:00 0.00% > /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build > 41566 root 1 52010M 1796K0 wait 1 0:00 0.00% > /bin/sh -e -c (cd > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! > /usr/bin/env QT_SELECT=qt5 QMAKEMODULES > 41567 root 2 52088M13M0 select 4 0:00 0.00% > /usr/local/bin/qemu-arm-static ninja -j28 -v all > 41585 root 2 520 100M24M0 kqread 8 0:00 0.00% > /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. > 41586 root 2 520 100M24M0 kqread 22 0:00 0.00% > /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. > > So: waiting in kqread trying to run cmake. > > Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not > resume the hung-up processes. Kills of the processes waiting on kqread stop > the build. > > Given the prior ports have been built already, building just > multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point. > > Building anything that requires
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
[I messed up the freebsd-emulation email address the first time I sent this. I also forgot to indicate the qemu-user-static vintage relationship.] I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port cross builds in another message sequence. But it turns out that one thing I ran into has hung-up every time, the same way, for amd64->armv7 cross builds: multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate report with some updated notes. A little context: I had built from ports head -r484783 before under FreeBSD head -r340287 (as I remember the version). Back then it did not have this problem that it now has under FreeBSD head -r341836 . One ports-specific change was to force perl5.28 as the default instead of perl5.26 originally. In fact this is what drives what is being rebuilt for my experiment that caught this. But I doubt the perl version is important to the problem. The context has a Ryzen Threadripper 1950X and has been tested both for FreeBSD under Hyper-V and for the same media native-booted. Both hang-up at the same point as seen via ps or top. The native tools for cross-build speedup were in use. Cross-builds targeting aarch64 did not get this problem but targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the first armv7 try. ADDED: The qemu-user-static back with head -r340287 before installing the updated ports would likely be different than the -r484783 vintage. So both FreeBSD and qemu-user-static may have changed over the comparison. The hang-up: In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up and timed out. Looking during the wait in later tries shows something much like (from one of the examples): root 337190.0 0.0 12920 3528 0 I11:40 0:00.03 | | `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh) root 415510.0 0.0 12920 3520 0 I11:43 0:00.00 | | `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh) root 415520.0 0.0 10340 1744 0 IJ 11:43 0:00.01 | | `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build root 415660.0 0.0 10236 1796 0 IJ 11:43 0:00.00 | | `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE root 415670.0 0.0 89976 12896 0 IJ 11:43 0:00.07 | | `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all root 415850.0 0.0 102848 25056 0 IJ 11:43 0:00.10 | | |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g root 415860.0 0.0 102852 25072 0 IJ 11:43 0:00.11 | | `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g or as top showed it: 41552 root 1 52010M 1744K0 wait15 0:00 0.00% /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build 41566 root 1 52010M 1796K0 wait 1 0:00 0.00% /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES 41567 root 2 52088M13M0 select 4 0:00 0.00% /usr/local/bin/qemu-arm-static ninja -j28 -v all 41585 root 2 520 100M24M0 kqread 8 0:00 0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. 41586 root 2 520 100M24M0 kqread 22 0:00 0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. So: waiting in kqread trying to run cmake. Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not resume the hung-up processes. Kills of the processes waiting on kqread stop the build. Given the prior ports have been built already, building just multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point. Building anything that requires multimedia/gstreamer1-qt@qt5 seems to be solidly blocked in my environment. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"