Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2019-01-05 Thread Jilles Tjoelker
On Fri, Jan 04, 2019 at 07:56:42AM +0100, Michal Meloun wrote:
> On 29.12.2018 18:47, Dennis Clarke wrote:
> > On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote:
> >>
> >> On 2018-Dec-28, at 12:12, Mark Millard  wrote:
> >>
> >>> On 2018-Dec-28, at 05:13, Michal Meloun 
> >>> wrote:
> >>>
>  Mark,
>  this is known problem with qemu-user-static.
>  Emulation of every single interruptible syscall is broken by design (it
>  have signal related races). Theses races cannot be solved without major
>  rewrite of syscall emulation code.
>  Unfortunately, nobody actively works on this, I think.
> 

> > Following along here quietly and I had to blink at this a few times.
> > Is there a bug report somewhere within the qemu world related to this
> >  'broken by design' qemu feature?

> Firstly, I apologize for late answer. Writing a technically accurate but
> still comprehensible report is extremely difficult for me.

> Major design issue with qemu-user is the fact that guest (blocking /
> interruptible) syscalls must be emulated atomically, including
> delivering of asynchronous signals (including signals originated by
> other thread).
> This is something that cannot be emulated precisely by user mode
> program, without specific kernel support. Let me explain this in a
> little more details.

> [snip]

> This look a much better. The code blocks all signals first, then checks
> if any signal is pending. If yes, then does not-blocking select()
> (because timeout is zero) and correctly returns EINTR immediately.
> Otherwise, it uses other variant of select(), pselect() which adjusts
> right signal mask itself.
> That's mean that syscall is called with blocked signal delivery, but
> kernel adjusts right sigmask before it waits for event. While this looks
> like perfect solution and this code closes all races from first version,
> then it doesn't. pselect() uses different semantic that select(), it
> doesn't update timeout argument. So this solution is also inappropriate.

FreeBSD select() never updates the passed timeout. When emulating Linux
syscalls, this will have to be done manually.

> Moreover, I think, we don't have p equivalents for all blocking
> syscalls.

We definitely do not. For example, open() has no equivalent with a
signal mask.

> Mark, I hope that this is also the answer to your question posted to
> hackers@ and also the exploitation why you see hang.

> Linux uses different approach to overcome this issue, safe_syscall ->
> https://gitlab.collabora.com/tomeu/qemu/commit/4d330cee37a21aabfc619a1948953559e66951a4
> It looks like workable workaround, but I'm not sure about ERESTART
> versus EINTR return values. Imho, this can be problem.

This looks like a reasonable solution. Musl libc uses the same approach
to implement pthread cancellation (where with the default "deferred"
cancellation type, cancellation takes effect at cancellation points
only, which include most blocking system calls; if a cancellation
request comes in at the same time as a blocking cancellation point
system call starts, the same race condition needs to be avoided).

As for ERESTART vs EINTR, EINTR can be treated like any other error. On
the other hand, ERESTART (or variants like ERESTARTSYS) is never
returned by the kernel, but instead causes the kernel to rewind the
program counter (so the system call instruction will be executed again)
just before invoking the signal handler. Therefore, when the host kernel
does this to qemu, qemu must do the same to the guest.

If a signal is delivered just before qemu makes a system call on behalf
of the guest, this may look like ERESTART. This is fine since it looks
the same as if the signal was delivered just before the guest's system
call instruction.

The approach as used by FreeBSD libc to implement pthread cancellation
(thr_wake(2) on self in the signal handler) will not let you distinguish
between ERESTART and EINTR, so you would have to replicate that
determination (which typically but not always depends on the signal's
SA_RESTART flag and which system call it is). Therefore, I would not
recommend that approach.

> I have list of other qemu-user problems (I mean mainly a bsd-user part
> of qemu code here), not counting normal coding bugs:
> - code is not thread safety but is used in threaded environment (rw
> locks for example),
> - emulate  some sysctl's and resource limits / usage behavior is very
> hard  (mainly if we emulate 32-bits guest on 64-bits host)

In many such cases, the proper behaviour can be found in the kernel code
(when a 64-bit kernel needs to handle a system call from a 32-bit
process).

I expect problems with getdirentries() and struct dirent.d_off with
filesystems that return hashed filenames as positions.

> - if host syscall returns ERESTART, we should do full unroll and pass it
> to guest.

Yes (with the above mentioned caveats about how ERESTART is returned).

> - the syscalls emulation should not use the libc functions, but 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2019-01-04 Thread Mark Millard



On 2019-Jan-3, at 22:56, Michal Meloun  wrote:

> On 29.12.2018 18:47, Dennis Clarke wrote:
>> On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote:
>>> 
>>> On 2018-Dec-28, at 12:12, Mark Millard  wrote:
>>> 
 On 2018-Dec-28, at 05:13, Michal Meloun 
 wrote:
 
> Mark,
> this is known problem with qemu-user-static.
> Emulation of every single interruptible syscall is broken by design (it
> have signal related races). Theses races cannot be solved without major
> rewrite of syscall emulation code.
> Unfortunately, nobody actively works on this, I think.
> 
>> 
>> Following along here quietly and I had to blink at this a few times.
>> Is there a bug report somewhere within the qemu world related to this
>>  'broken by design' qemu feature?
> 
> Firstly, I apologize for late answer. Writing a technically accurate but
> still comprehensible report is extremely difficult for me.

Thanks for doing so.

> . . .
> Mark, I hope that this is also the answer to your question posted to
> hackers@ and also the exploitation why you see hang.

Again thanks: it was helpful for my gaining some understanding of
the code structure.

But it turns out that another of your list of problems is involved
in the hang-up:

> . . .
> - and last major one. At this time, all guest structures are maintained
> by hand. Due to huge amount of these structures, this is the extreme
> error prone approach.  We should convert this to script generated code,
> including guest syscalls definition.

It turns out that "struct target_cmsghdr" has the wrong overall size,
the wrong first field size, and the wrong offsets for later fields
for amd64->aarch64 use (or likely any 64-bit->64-bit host-target
pair, even amd64->x86_64). In fact the code reports via:

  gemu_log("Unsupported ancillary data: %d/%d\n",
  cmsg->cmsg_level, cmsg->cmsg_type);


because of msg->cmsg_level and cmsg->cmsg_type ending up with
messed up values. It hangs after that message shows up. The
more complete code containing that qemu_log call is:

  if ((cmsg->cmsg_level == TARGET_SOL_SOCKET) &&
  (cmsg->cmsg_type == SCM_RIGHTS)) {
  int *fd = (int *)data;
  int *target_fd = (int *)target_data;
  int i, numfds = len / sizeof(int);

  for (i = 0; i < numfds; i++) {
  fd[i] = tswap32(target_fd[i]);
  }
  } else if ((cmsg->cmsg_level == TARGET_SOL_SOCKET) &&
  (cmsg->cmsg_type == SCM_TIMESTAMP) &&
  (len == sizeof(struct timeval)))  {
  /* copy struct timeval to host */
  struct timeval *tv = (struct timeval *)data;
  struct target_freebsd_timeval *target_tv =
  (struct target_freebsd_timeval *)target_data;
  __get_user(tv->tv_sec, _tv->tv_sec);
  __get_user(tv->tv_usec, _tv->tv_usec);
  } else {
  gemu_log("Unsupported ancillary data: %d/%d\n",
  cmsg->cmsg_level, cmsg->cmsg_type);
  memcpy(data, target_data, len);
  }

Of 3 types of hangups that I've run into recently, one was from a
missing statement, one was from struct target_kevent having the
wrong overall size and wrong field offsets after the first field
(amd64->armv7 was an example), and the one involving struct
target_cmsghdr above. (There may be more to the target_cmsghdr
one.)

> Again, my apology for slightly (or much) chaotic report, but this is the
> best what's I capable.

Not chaotic in my view.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2019-01-03 Thread Michal Meloun
On 29.12.2018 18:47, Dennis Clarke wrote:
> On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote:
>>
>> On 2018-Dec-28, at 12:12, Mark Millard  wrote:
>>
>>> On 2018-Dec-28, at 05:13, Michal Meloun 
>>> wrote:
>>>
 Mark,
 this is known problem with qemu-user-static.
 Emulation of every single interruptible syscall is broken by design (it
 have signal related races). Theses races cannot be solved without major
 rewrite of syscall emulation code.
 Unfortunately, nobody actively works on this, I think.

> 
> Following along here quietly and I had to blink at this a few times.
> Is there a bug report somewhere within the qemu world related to this
>  'broken by design' qemu feature?

Firstly, I apologize for late answer. Writing a technically accurate but
still comprehensible report is extremely difficult for me.

Major design issue with qemu-user is the fact that guest (blocking /
interruptible) syscalls must be emulated atomically, including
delivering of asynchronous signals (including signals originated by
other thread).
This is something that cannot be emulated precisely by user mode
program, without specific kernel support. Let me explain this in a
little more details.

Assume that we have following trivial code:
void sig_alarm_handler(…)
{
  if (!done) {
do some work;
alarm(10);
  }
}

void foo(void)
{
  install_signal_handler(SIGALARM, sig_alarm_handler);
  alarm(10);
  do some work;
  while (true) {
rv = select(…, NULL);
if (rv == 0)
  do some work;
else if (rv != EINTR)
  Report error end exit;
  }
}

In native environment, this code works well. It calls alarm signal
handler every 10s, irrespective if signal is fired in the program code
or in libc implementation of select() or if program is waiting in kernel
part of select() syscall.

In qemu-user environment, things get significantly harder. Qemu can
deliver signals to guest only on instruction boundary, the guest signal
handler should see emulated CPU context in consistent state. But kernel
can deliver signal to qemu in any time. Due to this, qemu must store
delivered signals into queue and emit these later, when emulator steps
over next instruction boundary.
Assume that qemu just emulates 'syscall' instruction from guest select()
call. Also assume that no other signals (but SIGALARM) are generated,
and socket used in select() never received or transmits any data.

The first version of qemu-user code emulating select() was:
abi_long do_freebsd_select(..)
{
 convert input guest arguments to host;
 rv = select(…);
 convert output host arguments to guest;
 return(rv);
}

But this is very racy. If alarm signal is fired before select(…) enters
kernel, qemu queues it (but does not deliver it to guest because it
isn't on instruction boundary) and continues in emulation. And because
(in our case) select() waits indefinitely, alarm signal is never
delivered to guest and whole program hangs.

Actual qemu code emulating select() looks like:
abi_long do_freebsd_select(..)
{
  convert input guest arguments to host;
  sigfillset();
  sigprocmask(SIG_BLOCK, , );
  if (ts->signal_pending) {
sigprocmask(SIG_SETMASK, , NULL);
   /* We have a signal pending so just poll select() and return. */
   tv2.tv_sec = tv2.tv_usec = 0;
   ret = select(…, , ));
 if (ret == 0)
   ret = TARGET_EINTR;
  } else {
ret = pselect(…, ));
sigprocmask(SIG_SETMASK, , NULL);
  }
  convert output host arguments to guest;
  return(rv);
}

This look a much better. The code blocks all signals first, then checks
if any signal is pending. If yes, then does not-blocking select()
(because timeout is zero) and correctly returns EINTR immediately.
Otherwise, it uses other variant of select(), pselect() which adjusts
right signal mask itself.
That's mean that syscall is called with blocked signal delivery, but
kernel adjusts right sigmask before it waits for event. While this looks
like perfect solution and this code closes all races from first version,
then it doesn't. pselect() uses different semantic that select(), it
doesn't update timeout argument. So this solution is also inappropriate.
Moreover, I think, we don't have p equivalents for all blocking
syscalls.
Mark, I hope that this is also the answer to your question posted to
hackers@ and also the exploitation why you see hang.

Linux uses different approach to overcome this issue, safe_syscall ->
https://gitlab.collabora.com/tomeu/qemu/commit/4d330cee37a21aabfc619a1948953559e66951a4
It looks like workable workaround, but I'm not sure about ERESTART
versus EINTR return values. Imho, this can be problem.

I have list of other qemu-user problems (I mean mainly a bsd-user part
of qemu code here), not counting normal coding bugs:
- code is not thread safety but is used in threaded environment (rw
locks for example),
- emulate  some sysctl's and resource limits / usage behavior is very
hard  (mainly if we emulate 32-bits guest on 64-bits host)
- if host syscall returns 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-31 Thread Mark Millard
[I listed my /usr/src svn veriosn information instead of /usr/ports .
Correcting. . .]

On 2018-Dec-31, at 12:05, Mark Millard  wrote:

> On 2018-Dec-31, at 10:16, Jonathan Chen  wrote:
> 
>> On Mon, 31 Dec 2018 at 21:05, Mark Millard  wrote:
>> [...]
>>> But if you have a form of hang-up that shows no sign of being tied
>>> to kevent or hangs-up only sometimes, I'd be surprised if the __packed
>>> change(s) would fix the issue.
>> 
>> With the __packed-modified qemu-user-static, the amd64->armv7
>> crossbuilds does not hang anymore, but I get build failures instead.
>> Interestingly enough, an unmodified qemu-user-static gets further
>> along in a amd64->armv6 crossbuild, with only one reproducible hang.
> 
> I tend to compare cross-build failures to native-build attempts. The
> multimedia-gstreamer1-qt@qt5 hang-up was qemu-arm-static specific,
> not occurring native. That and being reliable about hanging-up is
> what prompted the investigation.
> 
> The lld thread fanout hangup also has only happened under
> qemu-arm-static but I do not have a context with more than 4 cores for
> armv7: far less than 28 (FreeBSD under Hyper-V) or 32 cpus (FreeBSD
> native) that I use for cross-builds.
> 
> I do not know if you care to but it is possible to see if the FreeBSD
> package builders get failures or hangs for the same ports. I use
> head port build examples below:
> 
> http://beefy16.nyi.freebsd.org/jail.html?mastername=head-armv7-default
> 
> http://beefy8.nyi.freebsd.org/jail.html?mastername=head-armv6-default
> 
> The pages displayed show a list of port version (p??) and freebsd
> version (s??) looking like p??_s?? . Those links take you
> to pages for exploring the built, failed, skipped, and ignored
> ports.
> 
> Of course, for race-condition problems in builds, checking is messier
> because of needing to look at possibly many port/system combinations.
> 
> My attempts to build x11/lumina fail for:
> 
> [00:01:02] [01] [00:00:00] Building multimedia/libvpx | libvpx-1.7.0_2
> [00:02:23] [01] [00:01:21] Saved multimedia/libvpx | libvpx-1.7.0_2 wrkdir 
> to: 
> /usr/local/poudriere/data/wrkdirs/FBSDFSSDjailArmV7-default/default/libvpx-1.7.0_2.tar
> [00:02:23] [01] [00:01:21] Finished multimedia/libvpx | libvpx-1.7.0_2: 
> Failed: build
> [00:02:24] [01] [00:01:22] Skipping multimedia/ffmpeg | ffmpeg-4.1,1: 
> Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
> [00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-libav | 
> gstreamer1-libav-1.14.4_2: Dependent port multimedia/libvpx | libvpx-1.7.0_2 
> failed
> [00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-plugins-core | 
> gstreamer1-plugins-core-1.14: Dependent port multimedia/libvpx | 
> libvpx-1.7.0_2 failed
> [00:02:24] [01] [00:01:22] Skipping x11/lumina | lumina-1.4.1,3: Dependent 
> port multimedia/libvpx | libvpx-1.7.0_2 failed
> [00:02:24] [01] [00:01:22] Skipping x11/lumina-core | lumina-core-1.4.1: 
> Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
> . . .
> [00:06:19] Failed ports: multimedia/libvpx:build
> [00:06:19] Skipped ports: multimedia/ffmpeg multimedia/gstreamer1-libav 
> multimedia/gstreamer1-plugins-core x11/lumina x11/lumina-core
> [FBSDFSSDjailArmV7-default] [2018-12-30_17h04m02s] [committing:] Queued: 7  
> Built: 1  Failed: 1  Skipped: 5  Ignored: 0  Tobuild: 0   Time: 00:06:16
> 
> Native build attempts on an armv7 get the same.
> 
> But I'm still at:
> 
> . . .

Correcting to have the /usr/ports  information:

# svnlite info /usr/ports/ | grep "Re[plv]"
Relative URL: ^/head
Repository Root: svn://svn.freebsd.org/ports
Repository UUID: 35697150-7ecd-e111-bb59-0022644237b5
Revision: 484783
Last Changed Rev: 484783


> 
> because I froze at that while investigating the reliable hang and
> have not started progressing again yet. Last I looked the
> head-armv7-default package builds were also failing for libvpx if
> I remember right.

Looks like more recently libvpx builds on the package builders. So next time
that I update the ports tree I'll get to see the next problem (if any).

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-31 Thread Mark Millard
On 2018-Dec-31, at 10:16, Jonathan Chen  wrote:

> On Mon, 31 Dec 2018 at 21:05, Mark Millard  wrote:
> [...]
>> But if you have a form of hang-up that shows no sign of being tied
>> to kevent or hangs-up only sometimes, I'd be surprised if the __packed
>> change(s) would fix the issue.
> 
> With the __packed-modified qemu-user-static, the amd64->armv7
> crossbuilds does not hang anymore, but I get build failures instead.
> Interestingly enough, an unmodified qemu-user-static gets further
> along in a amd64->armv6 crossbuild, with only one reproducible hang.

I tend to compare cross-build failures to native-build attempts. The
multimedia-gstreamer1-qt@qt5 hang-up was qemu-arm-static specific,
not occurring native. That and being reliable about hanging-up is
what prompted the investigation.

The lld thread fanout hangup also has only happened under
qemu-arm-static but I do not have a context with more than 4 cores for
armv7: far less than 28 (FreeBSD under Hyper-V) or 32 cpus (FreeBSD
native) that I use for cross-builds.

I do not know if you care to but it is possible to see if the FreeBSD
package builders get failures or hangs for the same ports. I use
head port build examples below:

http://beefy16.nyi.freebsd.org/jail.html?mastername=head-armv7-default

http://beefy8.nyi.freebsd.org/jail.html?mastername=head-armv6-default

The pages displayed show a list of port version (p??) and freebsd
version (s??) looking like p??_s?? . Those links take you
to pages for exploring the built, failed, skipped, and ignored
ports.

Of course, for race-condition problems in builds, checking is messier
because of needing to look at possibly many port/system combinations.

My attempts to build x11/lumina fail for:

[00:01:02] [01] [00:00:00] Building multimedia/libvpx | libvpx-1.7.0_2
[00:02:23] [01] [00:01:21] Saved multimedia/libvpx | libvpx-1.7.0_2 wrkdir to: 
/usr/local/poudriere/data/wrkdirs/FBSDFSSDjailArmV7-default/default/libvpx-1.7.0_2.tar
[00:02:23] [01] [00:01:21] Finished multimedia/libvpx | libvpx-1.7.0_2: Failed: 
build
[00:02:24] [01] [00:01:22] Skipping multimedia/ffmpeg | ffmpeg-4.1,1: Dependent 
port multimedia/libvpx | libvpx-1.7.0_2 failed
[00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-libav | 
gstreamer1-libav-1.14.4_2: Dependent port multimedia/libvpx | libvpx-1.7.0_2 
failed
[00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-plugins-core | 
gstreamer1-plugins-core-1.14: Dependent port multimedia/libvpx | libvpx-1.7.0_2 
failed
[00:02:24] [01] [00:01:22] Skipping x11/lumina | lumina-1.4.1,3: Dependent port 
multimedia/libvpx | libvpx-1.7.0_2 failed
[00:02:24] [01] [00:01:22] Skipping x11/lumina-core | lumina-core-1.4.1: 
Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
. . .
[00:06:19] Failed ports: multimedia/libvpx:build
[00:06:19] Skipped ports: multimedia/ffmpeg multimedia/gstreamer1-libav 
multimedia/gstreamer1-plugins-core x11/lumina x11/lumina-core
[FBSDFSSDjailArmV7-default] [2018-12-30_17h04m02s] [committing:] Queued: 7  
Built: 1  Failed: 1  Skipped: 5  Ignored: 0  Tobuild: 0   Time: 00:06:16

Native build attempts on an armv7 get the same.

But I'm still at:

# svnlite info | grep "Re[plv]"
Relative URL: ^/head
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 341836
Last Changed Rev: 341836

because I froze at that while investigating the reliable hang and
have not started progressing again yet. Last I looked the
head-armv7-default package builds were also failing for libvpx if
I remember right.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-31 Thread Jonathan Chen
On Mon, 31 Dec 2018 at 21:05, Mark Millard  wrote:
[...]
> But if you have a form of hang-up that shows no sign of being tied
> to kevent or hangs-up only sometimes, I'd be surprised if the __packed
> change(s) would fix the issue.

With the __packed-modified qemu-user-static, the amd64->armv7
crossbuilds does not hang anymore, but I get build failures instead.
Interestingly enough, an unmodified qemu-user-static gets further
along in a amd64->armv6 crossbuild, with only one reproducible hang.

Cheers.
-- 
Jonathan Chen 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-31 Thread Jonathan Chen
On Mon, 31 Dec 2018 at 14:34, Mark Millard via freebsd-ports
 wrote:
>
> [Removing __packed did make the size and offsets match armv7
> and the build worked based on the reconstructed qemu-arm-static.]

Thanks for the analysis Mark! I've been suffering quite a few hangups
with my ports crossbuilds on amd64->armv7 on 12-STABLE, and I'll be
trying your suggestions to see whether it resolves the issue.

Cheers.
-- 
Jonathan Chen 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-31 Thread Mark Millard
On 2018-Dec-30, at 21:01, Jonathan Chen  wrote:

> On Mon, 31 Dec 2018 at 14:34, Mark Millard via freebsd-ports
>  wrote:
>> 
>> [Removing __packed did make the size and offsets match armv7
>> and the build worked based on the reconstructed qemu-arm-static.]
> 
> Thanks for the analysis Mark! I've been suffering quite a few hangups
> with my ports crossbuilds on amd64->armv7 on 12-STABLE, and I'll be
> trying your suggestions to see whether it resolves the issue.

If you have something like a kqread state for a hang-up consistently
in the same place, then Mikael Urankar 's fix (or any other
way of getting the right sizes and field offsets for kevent) has a
chance of fixing what you have observed.

But if you have a form of hang-up that shows no sign of being tied
to kevent or hangs-up only sometimes, I'd be surprised if the __packed
change(s) would fix the issue.

I've seen such racy hang-ups from lld's creation of (#cpu)+2 threads,
as FreeBSD counts cpus. I've selectively forced -Wl,--no-threads at
times in specific contexts to avoid that. binutils ld does not tolerate
the option. ports does not appear to have an equivalent of:

LDFLAGS.lld+= -Wl,--no-threads

that would be lld specific.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-30 Thread Mark Millard
[Removing __packed did make the size and offsets match armv7
and the build worked based on the reconstructed qemu-arm-static.]

On 2018-Dec-30, at 16:38, Mark Millard  wrote:

> On 2018-Dec-28, at 12:12, Mark Millard  wrote:
> 
>> On 2018-Dec-28, at 05:13, Michal Meloun  wrote:
>> 
>>> Mark,
>>> this is known problem with qemu-user-static.
>>> Emulation of every single interruptible syscall is broken by design (it
>>> have signal related races). Theses races cannot be solved without major
>>> rewrite of syscall emulation code.
>>> Unfortunately, nobody actively works on this, I think.
>>> 
>> 
>> Thanks for the note setting some expectations.
>> . . .
> 
> 
> It turns out that I've been through (part of?) this before and
> mikael.uran...@gmail.com had back then provided a qemu-user-static
> patch (that might have been arm specific or 32-bit target specific
> when running on a 64-bit host). (The qemu-user-static code structure
> seems to have changed some afterwards and the patch is no longer
> where he had pointed me to back then.)
> 
> To show size and offsets on armv7 vs. armd64 for struct kevent
> I use:
> 
> # more kevent_size_offsets.c 
> #include "/usr/include/sys/event.h" // kevent
> #include  // offsetof
> #include   // printf
> 
> int
> main()
> {
>printf("%lu\n", (unsigned long) sizeof(struct kevent));
>printf("ident %lu\n", (unsigned long) offsetof(struct kevent, ident));
>printf("filter %lu\n", (unsigned long) offsetof(struct kevent, 
> filter));
>printf("flags %lu\n", (unsigned long) offsetof(struct kevent, flags));
>printf("fflags %lu\n", (unsigned long) offsetof(struct kevent, 
> fflags));
>printf("data %lu\n", (unsigned long) offsetof(struct kevent, data));
>printf("udata %lu\n", (unsigned long) offsetof(struct kevent, udata));
>printf("ext %lu\n", (unsigned long) offsetof(struct kevent, ext));
>return 0;
> }
> 
> It ends up showing on armv7 (under qemu-arm-static insteead of native, not
> that it matters here):
> 
> # ./a.out
> 64
> ident 0
> filter 4
> flags 6
> fflags 8
> data 16
> udata 24
> ext 32
> 
> On amd64 (native) it ends up as:
> 
> # ./a.out
> 64
> ident 0
> filter 8
> flags 10
> fflags 12
> data 16
> udata 24
> ext 32
> 
> Thus a translation of layout is required when hosted. This is for:
> 
> struct kevent {
>__uintptr_t ident;  /* identifier for this event */
>short   filter; /* filter for event */
>unsigned short  flags;  /* action flags for kqueue */
>unsigned intfflags; /* filter flag value */
>__int64_t   data;   /* filter data value */
>void*udata; /* opaque user data identifier */
>__uint64_t  ext[4]; /* extensions */
> };
> 
> But qemu-user-static has for translation purposes:
> 
> struct target_freebsd_kevent {
>abi_ulong  ident;
>int16_tfilter;
>uint16_t   flags;
>uint32_t   fflags;
>int64_t data;
>abi_ulong  udata;
>uint64_t  ext[4];
> } __packed;
> 
> (note the __packed) for which in amd64's qemu_arm_static has
> the size and offsets:
> 
> # gdb qemu-arm-static
> . . .
> (gdb) p/d sizeof(struct target_freebsd_kevent)
> $1 = 56
> (gdb) p/d &((struct target_freebsd_kevent *)0)->ident
> $2 = 0
> (gdb) p/d &((struct target_freebsd_kevent *)0)->filter
> $3 = 4
> (gdb) p/d &((struct target_freebsd_kevent *)0)->flags
> $4 = 6
> (gdb) p/d &((struct target_freebsd_kevent *)0)->fflags
> $5 = 8
> (gdb) p/d &((struct target_freebsd_kevent *)0)->data
> $6 = 12
> (gdb) p/d &((struct target_freebsd_kevent *)0)->udata
> $7 = 20
> (gdb) p/d &((struct target_freebsd_kevent *)0)->ext
> $8 = 24
> 
> which which does not match the armv7 offsets for
> data, udata, or ext and does not have the right size
> for struct target_freebsd_kevent[] indexing to
> match armv7's struct target_freebsd_kevent[] indexing.
> 
> This in turn makes the do_freebsd_kevent code do the wrong
> thing in its:
> 
>struct target_freebsd_kevent *target_changelist, *target_eventlist;
> . . .
>for (i = 0; i < arg3; i++) {
>__get_user(changelist[i].ident, _changelist[i].ident);
>__get_user(changelist[i].filter, _changelist[i].filter);
>__get_user(changelist[i].flags, _changelist[i].flags);
>__get_user(changelist[i].fflags, _changelist[i].fflags);
>__get_user(changelist[i].data, _changelist[i].data);
>/* __get_user(changelist[i].udata, _changelist[i].udata); */
> #if TARGET_ABI_BITS == 32
>changelist[i].udata = (void 
> *)(uintptr_t)target_changelist[i].udata;
>tswap32s((uint32_t *)[i].udata);
> #else
>changelist[i].udata = (void 
> *)(uintptr_t)target_changelist[i].udata;
>tswap64s((uint64_t *)[i].udata);
> #endif
>__get_user(changelist[i].ext[0], _changelist[i].ext[0]);
>__get_user(changelist[i].ext[1], 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

2018-12-30 Thread Mark Millard



On 2018-Dec-28, at 12:12, Mark Millard  wrote:

> On 2018-Dec-28, at 05:13, Michal Meloun  wrote:
> 
>> Mark,
>> this is known problem with qemu-user-static.
>> Emulation of every single interruptible syscall is broken by design (it
>> have signal related races). Theses races cannot be solved without major
>> rewrite of syscall emulation code.
>> Unfortunately, nobody actively works on this, I think.
>> 
> 
> Thanks for the note setting some expectations.
> . . .


It turns out that I've been through (part of?) this before and
mikael.uran...@gmail.com had back then provided a qemu-user-static
patch (that might have been arm specific or 32-bit target specific
when running on a 64-bit host). (The qemu-user-static code structure
seems to have changed some afterwards and the patch is no longer
where he had pointed me to back then.)

To show size and offsets on armv7 vs. armd64 for struct kevent
I use:

# more kevent_size_offsets.c 
#include "/usr/include/sys/event.h" // kevent
#include  // offsetof
#include   // printf

int
main()
{
printf("%lu\n", (unsigned long) sizeof(struct kevent));
printf("ident %lu\n", (unsigned long) offsetof(struct kevent, ident));
printf("filter %lu\n", (unsigned long) offsetof(struct kevent, filter));
printf("flags %lu\n", (unsigned long) offsetof(struct kevent, flags));
printf("fflags %lu\n", (unsigned long) offsetof(struct kevent, fflags));
printf("data %lu\n", (unsigned long) offsetof(struct kevent, data));
printf("udata %lu\n", (unsigned long) offsetof(struct kevent, udata));
printf("ext %lu\n", (unsigned long) offsetof(struct kevent, ext));
return 0;
}

It ends up showing on armv7 (under qemu-arm-static insteead of native, not
that it matters here):

# ./a.out
64
ident 0
filter 4
flags 6
fflags 8
data 16
udata 24
ext 32

On amd64 (native) it ends up as:

# ./a.out
64
ident 0
filter 8
flags 10
fflags 12
data 16
udata 24
ext 32

Thus a translation of layout is required when hosted. This is for:

struct kevent {
__uintptr_t ident;  /* identifier for this event */
short   filter; /* filter for event */
unsigned short  flags;  /* action flags for kqueue */
unsigned intfflags; /* filter flag value */
__int64_t   data;   /* filter data value */
void*udata; /* opaque user data identifier */
__uint64_t  ext[4]; /* extensions */
};

But qemu-user-static has for translation purposes:

struct target_freebsd_kevent {
abi_ulong  ident;
int16_tfilter;
uint16_t   flags;
uint32_t   fflags;
int64_t data;
abi_ulong  udata;
uint64_t  ext[4];
} __packed;

(note the __packed) for which in amd64's qemu_arm_static has
the size and offsets:

# gdb qemu-arm-static
. . .
(gdb) p/d sizeof(struct target_freebsd_kevent)
$1 = 56
(gdb) p/d &((struct target_freebsd_kevent *)0)->ident
$2 = 0
(gdb) p/d &((struct target_freebsd_kevent *)0)->filter
$3 = 4
(gdb) p/d &((struct target_freebsd_kevent *)0)->flags
$4 = 6
(gdb) p/d &((struct target_freebsd_kevent *)0)->fflags
$5 = 8
(gdb) p/d &((struct target_freebsd_kevent *)0)->data
$6 = 12
(gdb) p/d &((struct target_freebsd_kevent *)0)->udata
$7 = 20
(gdb) p/d &((struct target_freebsd_kevent *)0)->ext
$8 = 24

which which does not match the armv7 offsets for
data, udata, or ext and does not have the right size
for struct target_freebsd_kevent[] indexing to
match armv7's struct target_freebsd_kevent[] indexing.

This in turn makes the do_freebsd_kevent code do the wrong
thing in its:

struct target_freebsd_kevent *target_changelist, *target_eventlist;
. . .
for (i = 0; i < arg3; i++) {
__get_user(changelist[i].ident, _changelist[i].ident);
__get_user(changelist[i].filter, _changelist[i].filter);
__get_user(changelist[i].flags, _changelist[i].flags);
__get_user(changelist[i].fflags, _changelist[i].fflags);
__get_user(changelist[i].data, _changelist[i].data);
/* __get_user(changelist[i].udata, _changelist[i].udata); */
#if TARGET_ABI_BITS == 32
changelist[i].udata = (void *)(uintptr_t)target_changelist[i].udata;
tswap32s((uint32_t *)[i].udata);
#else
changelist[i].udata = (void *)(uintptr_t)target_changelist[i].udata;
tswap64s((uint64_t *)[i].udata);
#endif
__get_user(changelist[i].ext[0], _changelist[i].ext[0]);
__get_user(changelist[i].ext[1], _changelist[i].ext[1]);
__get_user(changelist[i].ext[2], _changelist[i].ext[2]);
__get_user(changelist[i].ext[3], _changelist[i].ext[3]);
}
. . .
for (i = 0; i < arg5; i++) {
__put_user(eventlist[i].ident, _eventlist[i].ident);
__put_user(eventlist[i].filter, _eventlist[i].filter);
__put_user(eventlist[i].flags, _eventlist[i].flags);

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-29 Thread Mark Millard


On 2018-Dec-28, at 12:12, Mark Millard  wrote:

> On 2018-Dec-28, at 05:13, Michal Meloun  wrote:
> 
>> Mark,
>> this is known problem with qemu-user-static.
>> Emulation of every single interruptible syscall is broken by design (it
>> have signal related races). Theses races cannot be solved without major
>> rewrite of syscall emulation code.
>> Unfortunately, nobody actively works on this, I think.
>> 
> 
> Thanks for the note setting some expectations.
> 
> On the evidence that I have I expect that more is going on than that:
> 
> A) The hang-up always happens and always in the same place. So
> it would appear that no race is involved.
> 
> B) (A) is true even for varying the number of builders in parallel
> (so other builds also happening) and the number of jobs allowed per
> builder. It also fails for only one builder allowed only one process.
> (I get traces from that last kind of context.)
> 
> C) The problem started on the package-building servers for armv7
> and armv6 without qemu-user-static having an update (FreeBSD and
> cmake had updates, for example).
> 
> D) The problem is only observed for targeting armv7 and armv6 as
> far as I can tell. I've never seen it for aarch64, neither my
> own builds nor when I looked at the package-building server
> history.
> 
> At least that is what got me started. (I've since learned that
> qemu-user-static uses fork in place of a requested vfork.)
> 
> My ktrace/kdump experiment yesterday showed something odd for the
> kevent that hangs in cmake:
> 
> 93172 qemu-arm-static CALL  
> kevent(0x3,0x7ffe7d40,0x2,0x7ffd7d40,0x400,0)
> 93172 qemu-arm-static STRU  struct kevent[] = { { ident=6, 
> filter=EVFILT_READ, flags=0x1, fflags=0, data=0, udata=0x0 }
> { ident=0x0, filter=, flags=0, fflags=0x8, 
> data=0x1, udata=0x0 } }
> 
> Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct
> kevent[]. The kevent use is from cmake.
> 
> So far I've not identified a signal being delivered at a time that would seem
> to me to be likely to contribute. (But this is not familiar code so my 
> judgment
> is likely not the best.)
> 
> Note: I normally run FreeBSD using a non-debug kernel, even when using
> head. (The kernel does have symbols.)


The detail of the signal usage involved leading up to the hang-up,
starting from just before the "press return" for the "make FLAVOR=qt5"
command that I had entered:

The only "Interrupted system call" prior to my killing the hung cmake
process was (kdump -H -r -S output):

 93172 100717 qemu-arm-static CALL  execve[59](0x10392,0x8605051a0,0x860cf5400)
 93172 101706 qemu-arm-static RET   nanosleep[240] -1 errno 4 Interrupted 
system call
 93172 100717 qemu-arm-static NAMI  "/bin/sh"
 93172 100717 sh   RET   execve[59] JUSTRETURN
 93172 100717 sh   CALL  readlink[58](0x207a65,0x7fffccc0,0x400)

This is where ninja (via qemu-arm-static) execve's the amd64-native /bin/sh (to
in turn later run cmake via qemu-arm-static). (This was after the fork [for the
requested vfork].) So it is for the close-down of the thread that was in
nanosleep.

There were no PSIG's and no sigreturn's prior to the kill according to the
kdump output.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Mark Millard
[Using ktrace/kdump shows an apperent oddity in the kevent use that
hang-up in cmake, not that I know it causes the hang-up.]

On 2018-Dec-28, at 00:16, Mark Millard  wrote:

> [The historical notes are removed and replaced by partial trace
> information from example hang-ups, not that I've figured out
> what contributes yet.]
> 
> I ran into the following while trying to get evidence
> about the hang-up for an amd64->armv7 cross-build of
> multimedia/gstreamer1-qt@qt5 .
> 
> The following from trying to get evidence for the hang-up
> via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5”
> in a poudriere bulk -i’s interactive mode for the context
> that has the hang-up in normal poudriere-devel runs.
> 
> 
> From top after the hang-up (to identify some context):
> 
> 14528 root  2  520   100M24M0 kqread  11   0:00   0.00% 
> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
> 14527 root  2  52088M13M0 select  22   0:00   0.00% 
> /usr/local/bin/qemu-arm-static ninja -j1 -v all
> 
> from ps -auxd as well (to identify more context):
> 
> root   101140.0  0.0  10328  1756  1  I+J  13:47   0:00.01 |  
>`-- make FLAVOR=qt5
> root   145260.0  0.0  10204  1792  1  I+J  13:50   0:00.00 |  
>  `-- /bin/sh -e -c (cd 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
> /usr/bin/env QT_SELE
> root   145270.0  0.0  90304 13084  1  I+J  13:50   0:00.09 |  
>`-- /usr/local/bin/qemu-arm-static ninja -j1 -v all
> root   145280.0  0.0 102876 25060  1  IJ   13:50   0:00.12 |  
>  `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
> cmake_autogen /wrkdirs/usr/ports/multimedia/g
> 
> I had made a qemu-user-static that enabled do_strace when
> it is used to run cmake or ninja.
> 
> The only do_strace lines from qemu-arm-static running cmake
> or ninja mentioning process 14528 are included in the sequence:
> 
> (Before the below was a long list of "14527 fstatat” lines.
> I’ll note that "'Unknown syscall 545” is from ppoll use.)
> 
> 82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0
> 82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0
> 82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0
> 82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0
> 82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0
> 82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0
> 82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0
> 82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0
> 82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", 
> 0x9fffe200, 0) = 0
> 82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0
> 82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0
> 82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0
> 82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401
> 82400 close(6) = 0
> = 0
> 82400 Unknown syscall 545
> 82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0
> 82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0
> 82401 close(5) = 0
> 82401 open("/dev/null",0,0) = 5
> 82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0
> 82401 close(5) = 0
> 82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0
> 82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1
> 82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0
> 82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 
> sigpending(-1610620072,1,0,-191968524,0,0) = 0
> 
> The vfork then close(6) sequence for 82400 vs. the later
> use of 6 in dup2 in 82401 may be rather odd. But it looks
> like qemu-*-static uses do_freebsd_fork to implement
> do_freebsd_vfork, despite reporting vfork before
> calling do_freebsd_vfork. (Does the close(6) appear to
> indicate a race for native operation of ninja for the
> period when the address space is shared?)
> 
> Ninja has Subprocess::Start code that has:
> 
> #ifdef POSIX_SPAWN_USEVFORK
>  flags |= POSIX_SPAWN_USEVFORK;
> #endif
> 
> 
>  if (posix_spawnattr_setflags(, flags) != 0)
>Fatal("posix_spawnattr_setflags: %s", strerror(errno));
> 
>  const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL };
>  if (posix_spawn(_, "/bin/sh", , ,
>  const_cast(spawned_args), environ) != 0)
>Fatal("posix_spawn: %s", strerror(errno));
> 
> that is in use here. I think that this explains the vfork use.
> 
> 
> It turns out that putting the hung-up build in the background
> and then killing 82401 with the likes of kill -6 leads to more
> output that had apparently been buffered. It shows the use of
> the (amd64 native) /bin/sh that in turn leads to
> /usr/local/bin/cmake via qemu-arm-static. /bin/sh, being
> native, gets no do_strace output from qemu-arm-static.
> 
> 82400 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Mark Millard



On 2018-Dec-28, at 05:13, Michal Meloun  wrote:

> Mark,
> this is known problem with qemu-user-static.
> Emulation of every single interruptible syscall is broken by design (it
> have signal related races). Theses races cannot be solved without major
> rewrite of syscall emulation code.
> Unfortunately, nobody actively works on this, I think.
> 

Thanks for the note setting some expectations.

On the evidence that I have I expect that more is going on than that:

A) The hang-up always happens and always in the same place. So
it would appear that no race is involved.

B) (A) is true even for varying the number of builders in parallel
(so other builds also happening) and the number of jobs allowed per
builder. It also fails for only one builder allowed only one process.
(I get traces from that last kind of context.)

C) The problem started on the package-building servers for armv7
and armv6 without qemu-user-static having an update (FreeBSD and
cmake had updates, for example).

D) The problem is only observed for targeting armv7 and armv6 as
far as I can tell. I've never seen it for aarch64, neither my
own builds nor when I looked at the package-building server
history.

At least that is what got me started. (I've since learned that
qemu-user-static uses fork in place of a requested vfork.)

My ktrace/kdump experiment yesterday showed something odd for the
kevent that hangs in cmake:

93172 qemu-arm-static CALL  
kevent(0x3,0x7ffe7d40,0x2,0x7ffd7d40,0x400,0)
93172 qemu-arm-static STRU  struct kevent[] = { { ident=6, filter=EVFILT_READ, 
flags=0x1, fflags=0, data=0, udata=0x0 }
 { ident=0x0, filter=, flags=0, fflags=0x8, 
data=0x1, udata=0x0 } }

Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct
kevent[]. The kevent use is from cmake.

So far I've not identified a signal being delivered at a time that would seem
to me to be likely to contribute. (But this is not familiar code so my judgment
is likely not the best.)

Note: I normally run FreeBSD using a non-debug kernel, even when using
head. (The kernel does have symbols.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Michal Meloun



On 24.12.2018 8:28, Mark Millard wrote:
> [I built a FreeBSD head -r340288 context and tried ports head
> -r484783 and the problem repeated.]
> 
> On 2018-Dec-22, at 12:55, Mark Millard  wrote:
> 
>> [I found my E-mail records reporting successful builds using
>> qemu-user-static from ports head -r484783 under FreeBSD
>> head -r340287.]
>>
>> On 2018-Dec-22, at 00:10, Mark Millard  wrote:
>>
>>> [I messed up the freebsd-emulation email address the first time I sent
>>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>>>
>>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} 
>>> port cross
>>> builds in another message sequence. But it turns out that one thing I ran 
>>> into
>>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>>> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a 
>>> separate report
>>> with some updated notes.
>>>
>>> A little context: I had built from ports head -r484783 before under FreeBSD 
>>> head
>>> -r340287 (as I remember the version). Back then it did not have this 
>>> problem that it
>>> now has under FreeBSD head -r341836 . One ports-specific change was to 
>>> force perl5.28
>>> as the default instead of perl5.26 originally. In fact this is what drives 
>>> what is
>>> being rebuilt for my experiment that caught this. But I doubt the perl 
>>> version is
>>> important to the problem. The context has a Ryzen Threadripper 1950X and 
>>> has been
>>> tested both for FreeBSD under Hyper-V and for the same media native-booted. 
>>> Both
>>> hang-up at the same point as seen via ps or top. The native tools for 
>>> cross-build
>>> speedup were in use. Cross-builds targeting aarch64 did not get this 
>>> problem but
>>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for 
>>> the first
>>> armv7 try.
>>>
>>> ADDED: The qemu-user-static back with head -r340287 before installing the
>>> updated ports would likely be different than the -r484783 vintage. So both
>>> FreeBSD and qemu-user-static may have changed over the comparison.
>>
>> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
>> based on qemu-user-static from ports head -484783 --all built under FreeBSD
>> head -r340287 . So the use of the perl5.28 as the forced-default and the
>> newer FreeBSD head version -r341836 as the context are the differences here.
>>
>>> The hang-up:
>>>
>>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up 
>>> and timed
>>> out. Looking during the wait in later tries shows something much like (from 
>>> one of the
>>> examples):
>>>
>>> root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |  
>>>  `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>>> (gstreamer1-qt5-1.2.0_14) (sh)
>>> root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |  
>>>`-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>>> (gstreamer1-qt5-1.2.0_14) (sh)
>>> root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |  
>>>  `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt 
>>> FLAVOR=qt5 build
>>> root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |  
>>>`-- /bin/sh -e -c (cd 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>>> /usr/bin/env QT_SELE
>>> root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |  
>>>  `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |  
>>>|-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake 
>>> -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>> root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |  
>>>`-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake 
>>> -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>>
>>> or as top showed it:
>>>
>>> 41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
>>> /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>>> 41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
>>> /bin/sh -e -c (cd 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>>> /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>>> 41567 root  2  52088M13M0 select   4   0:00   0.00% 
>>> /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> 41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
>>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>> 41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
>>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>>
>>> So: 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Michal Meloun



On 24.12.2018 8:28, Mark Millard wrote:
> [I built a FreeBSD head -r340288 context and tried ports head
> -r484783 and the problem repeated.]
> 
> On 2018-Dec-22, at 12:55, Mark Millard  wrote:
> 
>> [I found my E-mail records reporting successful builds using
>> qemu-user-static from ports head -r484783 under FreeBSD
>> head -r340287.]
>>
>> On 2018-Dec-22, at 00:10, Mark Millard  wrote:
>>
>>> [I messed up the freebsd-emulation email address the first time I sent
>>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>>>
>>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} 
>>> port cross
>>> builds in another message sequence. But it turns out that one thing I ran 
>>> into
>>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>>> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a 
>>> separate report
>>> with some updated notes.
>>>
>>> A little context: I had built from ports head -r484783 before under FreeBSD 
>>> head
>>> -r340287 (as I remember the version). Back then it did not have this 
>>> problem that it
>>> now has under FreeBSD head -r341836 . One ports-specific change was to 
>>> force perl5.28
>>> as the default instead of perl5.26 originally. In fact this is what drives 
>>> what is
>>> being rebuilt for my experiment that caught this. But I doubt the perl 
>>> version is
>>> important to the problem. The context has a Ryzen Threadripper 1950X and 
>>> has been
>>> tested both for FreeBSD under Hyper-V and for the same media native-booted. 
>>> Both
>>> hang-up at the same point as seen via ps or top. The native tools for 
>>> cross-build
>>> speedup were in use. Cross-builds targeting aarch64 did not get this 
>>> problem but
>>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for 
>>> the first
>>> armv7 try.
>>>
>>> ADDED: The qemu-user-static back with head -r340287 before installing the
>>> updated ports would likely be different than the -r484783 vintage. So both
>>> FreeBSD and qemu-user-static may have changed over the comparison.
>>
>> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
>> based on qemu-user-static from ports head -484783 --all built under FreeBSD
>> head -r340287 . So the use of the perl5.28 as the forced-default and the
>> newer FreeBSD head version -r341836 as the context are the differences here.
>>
>>> The hang-up:
>>>
>>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up 
>>> and timed
>>> out. Looking during the wait in later tries shows something much like (from 
>>> one of the
>>> examples):
>>>
>>> root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |  
>>>  `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>>> (gstreamer1-qt5-1.2.0_14) (sh)
>>> root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |  
>>>`-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>>> (gstreamer1-qt5-1.2.0_14) (sh)
>>> root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |  
>>>  `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt 
>>> FLAVOR=qt5 build
>>> root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |  
>>>`-- /bin/sh -e -c (cd 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>>> /usr/bin/env QT_SELE
>>> root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |  
>>>  `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |  
>>>|-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake 
>>> -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>> root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |  
>>>`-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake 
>>> -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>>
>>> or as top showed it:
>>>
>>> 41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
>>> /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>>> 41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
>>> /bin/sh -e -c (cd 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>>> /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>>> 41567 root  2  52088M13M0 select   4   0:00   0.00% 
>>> /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> 41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
>>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>> 41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
>>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>>
>>> So: 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Mark Millard
[The historical notes are removed and replaced by partial trace
information from example hang-ups, not that I've figured out
what contributes yet.]

I ran into the following while trying to get evidence
about the hang-up for an amd64->armv7 cross-build of
multimedia/gstreamer1-qt@qt5 .

The following from trying to get evidence for the hang-up
via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5”
in a poudriere bulk -i’s interactive mode for the context
that has the hang-up in normal poudriere-devel runs.


From top after the hang-up (to identify some context):

14528 root  2  520   100M24M0 kqread  11   0:00   0.00% 
/usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
14527 root  2  52088M13M0 select  22   0:00   0.00% 
/usr/local/bin/qemu-arm-static ninja -j1 -v all

from ps -auxd as well (to identify more context):

root   101140.0  0.0  10328  1756  1  I+J  13:47   0:00.01 |
 `-- make FLAVOR=qt5
root   145260.0  0.0  10204  1792  1  I+J  13:50   0:00.00 |
   `-- /bin/sh -e -c (cd 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env 
QT_SELE
root   145270.0  0.0  90304 13084  1  I+J  13:50   0:00.09 |
 `-- /usr/local/bin/qemu-arm-static ninja -j1 -v all
root   145280.0  0.0 102876 25060  1  IJ   13:50   0:00.12 |
   `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
cmake_autogen /wrkdirs/usr/ports/multimedia/g

I had made a qemu-user-static that enabled do_strace when
it is used to run cmake or ninja.

The only do_strace lines from qemu-arm-static running cmake
or ninja mentioning process 14528 are included in the sequence:

(Before the below was a long list of "14527 fstatat” lines.
I’ll note that "'Unknown syscall 545” is from ppoll use.)

82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0
82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0
82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0
82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0
82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0
82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0
82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0
82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0
82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", 
0x9fffe200, 0) = 0
82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0
82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0
82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0
82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401
82400 close(6) = 0
 = 0
82400 Unknown syscall 545
82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0
82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0
82401 close(5) = 0
82401 open("/dev/null",0,0) = 5
82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0
82401 close(5) = 0
82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0
82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1
82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0
82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 
sigpending(-1610620072,1,0,-191968524,0,0) = 0

The vfork then close(6) sequence for 82400 vs. the later
use of 6 in dup2 in 82401 may be rather odd. But it looks
like qemu-*-static uses do_freebsd_fork to implement
do_freebsd_vfork, despite reporting vfork before
calling do_freebsd_vfork. (Does the close(6) appear to
indicate a race for native operation of ninja for the
period when the address space is shared?)

Ninja has Subprocess::Start code that has:

#ifdef POSIX_SPAWN_USEVFORK
  flags |= POSIX_SPAWN_USEVFORK;
#endif


  if (posix_spawnattr_setflags(, flags) != 0)
Fatal("posix_spawnattr_setflags: %s", strerror(errno));

  const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL };
  if (posix_spawn(_, "/bin/sh", , ,
  const_cast(spawned_args), environ) != 0)
Fatal("posix_spawn: %s", strerror(errno));

that is in use here. I think that this explains the vfork use.


It turns out that putting the hung-up build in the background
and then killing 82401 with the likes of kill -6 leads to more
output that had apparently been buffered. It shows the use of
the (amd64 native) /bin/sh that in turn leads to
/usr/local/bin/cmake via qemu-arm-static. /bin/sh, being
native, gets no do_strace output from qemu-arm-static.

82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0
82400 read(5,0x9fffd368,4096) = 58
82400 Unknown syscall 545
82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0
82400 read(5,0x9fffd368,4096) = 0
82400 close(5) = 0
82400 wait4(82401,-1610620004,0,0,-191968640,0) = 82401
82400 mmap(0,86016,3,201330690,-1,-1610620169) = 0xf4777000
82400 gettimeofday(-1610620224,0,4,-1610619944,31,16777216) = 0

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-24 Thread Mark Millard
[A native poudreire-devel based build of
multimedia/gstreamer1-qt@qt5 did not hang-up
and worked fine. Official package build history
also provides some evidence.]

On 2018-Dec-22, at 12:55, Mark Millard  wrote:

> [I found my E-mail records reporting successful builds using
> qemu-user-static from ports head -r484783 under FreeBSD
> head -r340287.]
> 
> On 2018-Dec-22, at 00:10, Mark Millard  wrote:
> 
>> [I messed up the freebsd-emulation email address the first time I sent
>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>> 
>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} 
>> port cross
>> builds in another message sequence. But it turns out that one thing I ran 
>> into
>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a 
>> separate report
>> with some updated notes.
>> 
>> A little context: I had built from ports head -r484783 before under FreeBSD 
>> head
>> -r340287 (as I remember the version). Back then it did not have this problem 
>> that it
>> now has under FreeBSD head -r341836 . One ports-specific change was to force 
>> perl5.28
>> as the default instead of perl5.26 originally. In fact this is what drives 
>> what is
>> being rebuilt for my experiment that caught this. But I doubt the perl 
>> version is
>> important to the problem. The context has a Ryzen Threadripper 1950X and has 
>> been
>> tested both for FreeBSD under Hyper-V and for the same media native-booted. 
>> Both
>> hang-up at the same point as seen via ps or top. The native tools for 
>> cross-build
>> speedup were in use. Cross-builds targeting aarch64 did not get this problem 
>> but
>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the 
>> first
>> armv7 try.
>> 
>> ADDED: The qemu-user-static back with head -r340287 before installing the
>> updated ports would likely be different than the -r484783 vintage. So both
>> FreeBSD and qemu-user-static may have changed over the comparison.
> 
> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
> based on qemu-user-static from ports head -484783 --all built under FreeBSD
> head -r340287 . So the use of the perl5.28 as the forced-default and the
> newer FreeBSD head version -r341836 as the context are the differences here.
> 
>> The hang-up:
>> 
>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up 
>> and timed
>> out. Looking during the wait in later tries shows something much like (from 
>> one of the
>> examples):
>> 
>> root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |   
>> `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>> (gstreamer1-qt5-1.2.0_14) (sh)
>> root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |   
>>   `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>> (gstreamer1-qt5-1.2.0_14) (sh)
>> root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |   
>> `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt 
>> FLAVOR=qt5 build
>> root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |   
>>   `-- /bin/sh -e -c (cd 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>> /usr/bin/env QT_SELE
>> root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |   
>> `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>> root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |   
>>   |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
>> cmake_autogen /wrkdirs/usr/ports/multimedia/g
>> root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |   
>>   `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
>> cmake_autogen /wrkdirs/usr/ports/multimedia/g
>> 
>> or as top showed it:
>> 
>> 41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
>> /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>> 41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
>> /bin/sh -e -c (cd 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>> /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>> 41567 root  2  52088M13M0 select   4   0:00   0.00% 
>> /usr/local/bin/qemu-arm-static ninja -j28 -v all
>> 41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>> 41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>> 
>> So: waiting in kqread trying to run cmake.
>> 
>> Unlike some intermittent 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-23 Thread Mark Millard
[I built a FreeBSD head -r340288 context and tried ports head
-r484783 and the problem repeated.]

On 2018-Dec-22, at 12:55, Mark Millard  wrote:

> [I found my E-mail records reporting successful builds using
> qemu-user-static from ports head -r484783 under FreeBSD
> head -r340287.]
> 
> On 2018-Dec-22, at 00:10, Mark Millard  wrote:
> 
>> [I messed up the freebsd-emulation email address the first time I sent
>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>> 
>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} 
>> port cross
>> builds in another message sequence. But it turns out that one thing I ran 
>> into
>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a 
>> separate report
>> with some updated notes.
>> 
>> A little context: I had built from ports head -r484783 before under FreeBSD 
>> head
>> -r340287 (as I remember the version). Back then it did not have this problem 
>> that it
>> now has under FreeBSD head -r341836 . One ports-specific change was to force 
>> perl5.28
>> as the default instead of perl5.26 originally. In fact this is what drives 
>> what is
>> being rebuilt for my experiment that caught this. But I doubt the perl 
>> version is
>> important to the problem. The context has a Ryzen Threadripper 1950X and has 
>> been
>> tested both for FreeBSD under Hyper-V and for the same media native-booted. 
>> Both
>> hang-up at the same point as seen via ps or top. The native tools for 
>> cross-build
>> speedup were in use. Cross-builds targeting aarch64 did not get this problem 
>> but
>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the 
>> first
>> armv7 try.
>> 
>> ADDED: The qemu-user-static back with head -r340287 before installing the
>> updated ports would likely be different than the -r484783 vintage. So both
>> FreeBSD and qemu-user-static may have changed over the comparison.
> 
> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
> based on qemu-user-static from ports head -484783 --all built under FreeBSD
> head -r340287 . So the use of the perl5.28 as the forced-default and the
> newer FreeBSD head version -r341836 as the context are the differences here.
> 
>> The hang-up:
>> 
>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up 
>> and timed
>> out. Looking during the wait in later tries shows something much like (from 
>> one of the
>> examples):
>> 
>> root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |   
>> `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>> (gstreamer1-qt5-1.2.0_14) (sh)
>> root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |   
>>   `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
>> (gstreamer1-qt5-1.2.0_14) (sh)
>> root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |   
>> `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt 
>> FLAVOR=qt5 build
>> root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |   
>>   `-- /bin/sh -e -c (cd 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>> /usr/bin/env QT_SELE
>> root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |   
>> `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>> root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |   
>>   |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
>> cmake_autogen /wrkdirs/usr/ports/multimedia/g
>> root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |   
>>   `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
>> cmake_autogen /wrkdirs/usr/ports/multimedia/g
>> 
>> or as top showed it:
>> 
>> 41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
>> /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>> 41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
>> /bin/sh -e -c (cd 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
>> /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>> 41567 root  2  52088M13M0 select   4   0:00   0.00% 
>> /usr/local/bin/qemu-arm-static ninja -j28 -v all
>> 41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>> 41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>> 
>> So: waiting in kqread trying to run cmake.
>> 
>> Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
>> resume the 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-22 Thread Mark Millard
[I found my E-mail records reporting successful builds using
qemu-user-static from ports head -r484783 under FreeBSD
head -r340287.]

On 2018-Dec-22, at 00:10, Mark Millard  wrote:

> [I messed up the freebsd-emulation email address the first time I sent
> this. I also forgot to indicate the qemu-user-static vintage relationship.]
> 
> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port 
> cross
> builds in another message sequence. But it turns out that one thing I ran into
> has hung-up every time, the same way, for amd64->armv7 cross builds:
> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate 
> report
> with some updated notes.
> 
> A little context: I had built from ports head -r484783 before under FreeBSD 
> head
> -r340287 (as I remember the version). Back then it did not have this problem 
> that it
> now has under FreeBSD head -r341836 . One ports-specific change was to force 
> perl5.28
> as the default instead of perl5.26 originally. In fact this is what drives 
> what is
> being rebuilt for my experiment that caught this. But I doubt the perl 
> version is
> important to the problem. The context has a Ryzen Threadripper 1950X and has 
> been
> tested both for FreeBSD under Hyper-V and for the same media native-booted. 
> Both
> hang-up at the same point as seen via ps or top. The native tools for 
> cross-build
> speedup were in use. Cross-builds targeting aarch64 did not get this problem 
> but
> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the 
> first
> armv7 try.
> 
> ADDED: The qemu-user-static back with head -r340287 before installing the
> updated ports would likely be different than the -r484783 vintage. So both
> FreeBSD and qemu-user-static may have changed over the comparison.

CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
based on qemu-user-static from ports head -484783 --all built under FreeBSD
head -r340287 . So the use of the perl5.28 as the forced-default and the
newer FreeBSD head version -r341836 as the context are the differences here.

> The hang-up:
> 
> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up 
> and timed
> out. Looking during the wait in later tries shows something much like (from 
> one of the
> examples):
> 
> root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |
>`-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
> (gstreamer1-qt5-1.2.0_14) (sh)
> root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |
>  `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
> (gstreamer1-qt5-1.2.0_14) (sh)
> root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |
>`-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt 
> FLAVOR=qt5 build
> root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |
>  `-- /bin/sh -e -c (cd 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
> /usr/bin/env QT_SELE
> root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |
>`-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
> root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |
>  |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
> cmake_autogen /wrkdirs/usr/ports/multimedia/g
> root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |
>  `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
> cmake_autogen /wrkdirs/usr/ports/multimedia/g
> 
> or as top showed it:
> 
> 41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
> /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
> 41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
> /bin/sh -e -c (cd 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
> /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
> 41567 root  2  52088M13M0 select   4   0:00   0.00% 
> /usr/local/bin/qemu-arm-static ninja -j28 -v all
> 41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
> 41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
> 
> So: waiting in kqread trying to run cmake.
> 
> Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
> resume the hung-up processes. Kills of the processes waiting on kqread stop
> the build.
> 
> Given the prior ports have been built already, building just
> multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point.
> 
> Building anything that requires 

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-22 Thread Mark Millard
[I messed up the freebsd-emulation email address the first time I sent
this. I also forgot to indicate the qemu-user-static vintage relationship.]

I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port 
cross
builds in another message sequence. But it turns out that one thing I ran into
has hung-up every time, the same way, for amd64->armv7 cross builds:
multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate 
report
with some updated notes.

A little context: I had built from ports head -r484783 before under FreeBSD head
-r340287 (as I remember the version). Back then it did not have this problem 
that it
now has under FreeBSD head -r341836 . One ports-specific change was to force 
perl5.28
as the default instead of perl5.26 originally. In fact this is what drives what 
is
being rebuilt for my experiment that caught this. But I doubt the perl version 
is
important to the problem. The context has a Ryzen Threadripper 1950X and has 
been
tested both for FreeBSD under Hyper-V and for the same media native-booted. Both
hang-up at the same point as seen via ps or top. The native tools for 
cross-build
speedup were in use. Cross-builds targeting aarch64 did not get this problem but
targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the 
first
armv7 try.

ADDED: The qemu-user-static back with head -r340287 before installing the
updated ports would likely be different than the -r484783 vintage. So both
FreeBSD and qemu-user-static may have changed over the comparison.


The hang-up:

In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up and 
timed
out. Looking during the wait in later tries shows something much like (from one 
of the
examples):

root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |  
 `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
(gstreamer1-qt5-1.2.0_14) (sh)
root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |  
   `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
(gstreamer1-qt5-1.2.0_14) (sh)
root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |  
 `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 
build
root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |  
   `-- /bin/sh -e -c (cd 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env 
QT_SELE
root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |  
 `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |  
   |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
cmake_autogen /wrkdirs/usr/ports/multimedia/g
root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |  
   `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
cmake_autogen /wrkdirs/usr/ports/multimedia/g

or as top showed it:

41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
/usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
/bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; 
if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
41567 root  2  52088M13M0 select   4   0:00   0.00% 
/usr/local/bin/qemu-arm-static ninja -j28 -v all
41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
/usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
/usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.

So: waiting in kqread trying to run cmake.

Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
resume the hung-up processes. Kills of the processes waiting on kqread stop
the build.

Given the prior ports have been built already, building just
multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point.

Building anything that requires multimedia/gstreamer1-qt@qt5 seems to be
solidly blocked in my environment.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-21 Thread Mark Millard
I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port 
cross
builds in another message sequence. But it turns out that one thing I ran into
has hung-up every time, the same way, for amd64->armv7 cross builds:
multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate 
report
with some updated notes.

A little context: I had built from ports head -r484783 before under FreeBSD head
-r340287 (as I remember the version). Back then it did not have this problem 
that it
now has under FreeBSD head -r341836 . One ports-specific change was to force 
perl5.28
as the default instead of perl5.26 originally. In fact this is what drives what 
is
being rebuilt for my experiment that caught this. But I doubt the perl version 
is
important to the problem. The context has a Ryzen Threadripper 1950X and has 
been
tested both for FreeBSD under Hyper-V and for the same media native-booted. Both
hang-up at the same point as seen via ps or top. The native tools for 
cross-build
speedup were in use. Cross-builds targeting aarch64 did not get this problem but
targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the 
first
armv7 try.


The hang-up:

In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up and 
timed
out. Looking during the wait in later tries shows something much like (from one 
of the
examples):

root   337190.0  0.0  12920  3528  0  I11:40   0:00.03 | |  
 `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
(gstreamer1-qt5-1.2.0_14) (sh)
root   415510.0  0.0  12920  3520  0  I11:43   0:00.00 | |  
   `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg 
(gstreamer1-qt5-1.2.0_14) (sh)
root   415520.0  0.0  10340  1744  0  IJ   11:43   0:00.01 | |  
 `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 
build
root   415660.0  0.0  10236  1796  0  IJ   11:43   0:00.00 | |  
   `-- /bin/sh -e -c (cd 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env 
QT_SELE
root   415670.0  0.0  89976 12896  0  IJ   11:43   0:00.07 | |  
 `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
root   415850.0  0.0 102848 25056  0  IJ   11:43   0:00.10 | |  
   |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
cmake_autogen /wrkdirs/usr/ports/multimedia/g
root   415860.0  0.0 102852 25072  0  IJ   11:43   0:00.11 | |  
   `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
cmake_autogen /wrkdirs/usr/ports/multimedia/g

or as top showed it:

41552 root  1  52010M  1744K0 wait15   0:00   0.00% 
/usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
41566 root  1  52010M  1796K0 wait 1   0:00   0.00% 
/bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; 
if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
41567 root  2  52088M13M0 select   4   0:00   0.00% 
/usr/local/bin/qemu-arm-static ninja -j28 -v all
41585 root  2  520   100M24M0 kqread   8   0:00   0.00% 
/usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
41586 root  2  520   100M24M0 kqread  22   0:00   0.00% 
/usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.

So: waiting in kqread trying to run cmake.

Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
resume the hung-up processes. Kills of the processes waiting on kqread stop
the build.

Given the prior ports have been built already, building just
multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point.

Building anything that requires multimedia/gstreamer1-qt@qt5 seems to be
solidly blocked in my environment.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"