Re: Coq build issues in F32

2020-03-28 Thread Richard W.M. Jones
On Fri, Mar 27, 2020 at 05:55:42PM -0600, Jerry James wrote:
> On Fri, Mar 27, 2020 at 4:42 PM Jerry James  wrote:
> > Good idea.  Thank you!
> 
> I got a segfault on s390x on the second build attempt.  I'd like to
> investigate the ephemeron angle a bit, I think.

No crashes here overnight, but I'll leave it running.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Stephen J. Turnbull
Jerry James writes:

 > I think Richard addressed most of the points you raised.

Yes, and no new questions, either.  So I think I'm done here (well,
I'll spectate, but the internals of the OCaml compiler are well beyond
my skillset :-).

Good luck!

Steve

-- 
Associate Professor  Division of Policy and Planning Science
http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information
Email: turnb...@sk.tsukuba.ac.jp   University of Tsukuba
Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Fri, Mar 27, 2020 at 4:42 PM Jerry James  wrote:
> Good idea.  Thank you!

I got a segfault on s390x on the second build attempt.  I'd like to
investigate the ephemeron angle a bit, I think.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Fri, Mar 27, 2020 at 4:41 PM Richard W.M. Jones  wrote:
> I'll run it in a loop overnight and see if I can make it crash.

Good idea.  Thank you!
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Richard W.M. Jones
On Fri, Mar 27, 2020 at 04:20:16PM -0600, Jerry James wrote:
> On Fri, Mar 27, 2020 at 2:41 PM Richard W.M. Jones  wrote:
> > Yup, and the stack trace shows the failure happening when creating an
> > ephemeron.
> 
> On the other hand, a scratch build with coq upstream's final patch for
> ocaml 4.10.0 succeeded on the first try:
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=42808115
> 
> Where this is an intermittent problem, I'll have to try more builds to
> be sure, but this looks promising.

I'll run it in a loop overnight and see if I can make it crash.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Fri, Mar 27, 2020 at 2:41 PM Richard W.M. Jones  wrote:
> Yup, and the stack trace shows the failure happening when creating an
> ephemeron.

On the other hand, a scratch build with coq upstream's final patch for
ocaml 4.10.0 succeeded on the first try:

https://koji.fedoraproject.org/koji/taskinfo?taskID=42808115

Where this is an intermittent problem, I'll have to try more builds to
be sure, but this looks promising.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Richard W.M. Jones
On Fri, Mar 27, 2020 at 11:54:44AM -0600, Jerry James wrote:
> On Thu, Mar 26, 2020 at 10:28 AM Richard W.M. Jones  wrote:
> > Jerry, I suggest this bug is real, but is also likely to be a bug in
> > Coq (most likely) or the OCaml runtime, possibly in the Weak module.
> > You might have more luck asking the upstream developers for help.
> 
> Coq issue: https://github.com/coq/coq/issues/11939
> 
> I looked around in the ocaml issue tracker and found a possibility:
> 
> https://github.com/ocaml/ocaml/issues/9391
> 
> Coq uses ephemerons.

Yup, and the stack trace shows the failure happening when creating an
ephemeron.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Richard W.M. Jones
On Fri, Mar 27, 2020 at 11:15:01AM -0600, Jerry James wrote:
> On Fri, Mar 27, 2020 at 11:03 AM Jerry James  wrote:
> > Okay, I will give that a try.
> 
> Except I can't.  We're already not using %_smp_mflags.  Ugh.

I realized that my build machine has MAKEFLAGS=-j24 which is why it's
building coq in parallel (and quickly).  So it seems like you could be
using _smp_mflags.

Another thing I tried today was enabling debugging in the GC (see
patch below).  However it also didn't find any problems, so I'm still
at a loss.

Rich.

diff --git a/coq.spec b/coq.spec
index df2a847..1e590cc 100644
--- a/coq.spec
+++ b/coq.spec
@@ -164,6 +164,10 @@ done
 %global opt_option -bytecode-compiler yes -byte-only -coqide byte
 %endif
 
+# Use debug runtime.
+sed -i 's/^\(OCAMLC :=.*\)/\1 -runtime-variant d/' Makefile.build
+sed -i 's/^\(OCAMLOPT :=.*\)/\1 -runtime-variant d/' Makefile.build
+
 %global coqdocdir %{?_pkgdocdir}%{!?_pkgdocdir:%{_docdir}/coq-%{version}}
 %global coqdatadir %{_libdir}/coq
 


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Mar 27, 2020 at 11:36:38AM -0600, Jerry James wrote:
> On Fri, Mar 27, 2020 at 11:30 AM Zbigniew Jędrzejewski-Szmek
>  wrote:
> > We are. Did you trying putting '%global _smp_mflags -j1' somewhere
> > near the top of the spec file?
> 
> We are?  Where?

That is what the packaging guidelines say [1]:
> Whenever possible, invocations of make should be done as
> %make_build

(which is defined as
$ rpmbuild --showrc |grep make_build
make_build %{__make} %{_make_output_sync} %{?_smp_mflags} %{_make_verbose}).

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/#_parallel_make

> The spec file does this to build:
> 
> make world VERBOSE=1

So it seems that this particular Makefile enables parallelization
internally. (make defaults -j1 unless told otherwise). But it should
be possible to somehow disable it wherever it is enabled.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Thu, Mar 26, 2020 at 10:28 AM Richard W.M. Jones  wrote:
> Jerry, I suggest this bug is real, but is also likely to be a bug in
> Coq (most likely) or the OCaml runtime, possibly in the Weak module.
> You might have more luck asking the upstream developers for help.

Coq issue: https://github.com/coq/coq/issues/11939

I looked around in the ocaml issue tracker and found a possibility:

https://github.com/ocaml/ocaml/issues/9391

Coq uses ephemerons.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Fri, Mar 27, 2020 at 11:30 AM Zbigniew Jędrzejewski-Szmek
 wrote:
> We are. Did you trying putting '%global _smp_mflags -j1' somewhere
> near the top of the spec file?

We are?  Where?  The spec file does this to build:

make world VERBOSE=1

-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Mar 27, 2020 at 11:15:01AM -0600, Jerry James wrote:
> On Fri, Mar 27, 2020 at 11:03 AM Jerry James  wrote:
> > Okay, I will give that a try.
> 
> Except I can't.  We're already not using %_smp_mflags.  Ugh.

We are. Did you trying putting '%global _smp_mflags -j1' somewhere
near the top of the spec file?

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Fri, Mar 27, 2020 at 11:03 AM Jerry James  wrote:
> Okay, I will give that a try.

Except I can't.  We're already not using %_smp_mflags.  Ugh.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Wed, Mar 25, 2020 at 10:21 PM Stephen J. Turnbull  wrote:
> Hi Jerry!  Long time no see.

Hi Stephen!  It's good to hear from yet another person I've worked
with via network for years.

I think Richard addressed most of the points you raised.

> Where is the segfault?  Is it in coqc or is it in the generated code
> or is it in the OCaml compiler or is it somewhere else?

It's in coqc.  Richard's evidence seems to point to garbage collector
involvement, with weak references possibly playing a role.

> Stay healthy!

You, too, and likewise for everybody in the Fedora community.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-27 Thread Jerry James
On Thu, Mar 26, 2020 at 10:28 AM Richard W.M. Jones  wrote:
> I don't know, now I can't even get the ‘fedpkg local’ to reproduce it :-(
>
> Jerry, I suggest this bug is real, but is also likely to be a bug in
> Coq (most likely) or the OCaml runtime, possibly in the Weak module.
> You might have more luck asking the upstream developers for help.
>
> As for what to do about Fedora 32, how about disabling _smp_mflags, on
> the basis that it might be load-related?  That should at least reduce
> the non-determinism.

Okay, I will give that a try.  I'm worried that even if I get a "good"
build, that it's going to explode on the hapless Fedora users who
install it.

Thank you for all the work you put into this, Richard.  I appreciate
the investigation.  I'll keep poking at it, too.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-26 Thread Richard W.M. Jones
I don't know, now I can't even get the ‘fedpkg local’ to reproduce it :-(

Jerry, I suggest this bug is real, but is also likely to be a bug in
Coq (most likely) or the OCaml runtime, possibly in the Weak module.
You might have more luck asking the upstream developers for help.

As for what to do about Fedora 32, how about disabling _smp_mflags, on
the basis that it might be load-related?  That should at least reduce
the non-determinism.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-26 Thread Richard W.M. Jones
On Thu, Mar 26, 2020 at 12:00:14PM +, Richard W.M. Jones wrote:
> This bug is VERY annoying!  Although not very reproducible, I can
> usually hit it after about 4-6 hours of looping ‘fedpkg build’.

I mean ‘fedpkg local’.  Local builds, not building in Koji.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-26 Thread Richard W.M. Jones
This bug is VERY annoying!  Although not very reproducible, I can
usually hit it after about 4-6 hours of looping ‘fedpkg build’.

Considering the whole Coq build is quite lengthy you'd think that one
of these commands would hit it much more quickly:

  $ while true ; do gdb -ex 'set args -coqlib . -q -native-compiler yes 
theories/FSets/FMapFullAVL.v' -ex run bin/coqc -ex quit ; done 

or

  $ while bin/coqc -coqlib . -q -native-compiler yes 
theories/FSets/FMapFullAVL.v ; do date; done

But nope ...

My latest thing is to run the gdb command above _and_ run fedpkg build
(in another directory) at the same time, in the hope that it's caused
by system load.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-26 Thread Richard W.M. Jones
On Thu, Mar 26, 2020 at 01:20:58PM +0900, Stephen J. Turnbull wrote:
> Where is the segfault?  Is it in coqc or is it in the generated code
> or is it in the OCaml compiler or is it somewhere else?

I managed to reproduce this on my mostly Rawhide x86-64 development
machine by running fedpkg build in a loop overnight.  This also
captured a core dump.

Unfortunately the stack trace, even with full debuginfo installed, was
not very helpful.  I'm now trying to run the unstable bin/coqc command
over and over again under gdb, but the bug has not reproduced there so
far.

One thing I did learn from the stack trace is that it appears to fail
in C code (in the garbage collector).  However that might be because
the OCaml heap is corrupted by an earlier event.  Another thing I
learned is that it happens when doing something or other with weak
references, which can be non-deterministic.

> If you can't rule out OCaml: does it produce native code directly, or
> does it produce C or similar and then use GCC or similar to produce
> the native code?

OCaml produces native code directly.  GCC is only used to link the
final object files together.

Rich.


Core was generated by `bin/coqc -coqlib . -q -native-compiler yes 
theories/FSets/FMapFullAVL.v'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x556989736c94 in ?? ()
(gdb) t a a bt

Thread 1 (Thread 0x7f7453e1ec00 (LWP 2710541)):
#0  0x556989736c94 in ?? ()
#1  0x0001 in ?? ()
#2  0x0001 in ?? ()
#3  0x7f7453440ec8 in ?? ()
#4  0xfec0 in ?? ()
#5  0x5569c42daba8 in ?? ()
#6  0x0004 in ?? ()
#7  0x0002 in ?? ()
#8  0x00042d7f6c00 in ?? ()
#9  0x0001 in ?? ()

^-- probable bad stack frame

#10 0x5569c35adc63 in caml_alloc_shr ()

^-- garbage collector

https://github.com/ocaml/ocaml/blob/0756052841d0d3e6266c8c1ca4a608dc16c11441/runtime/memory.c#L560

#11 0x5569c35c00bc in caml_ephe_set_key ()

^-- weak ephemerons

https://github.com/ocaml/ocaml/blob/0756052841d0d3e6266c8c1ca4a608dc16c11441/runtime/weak.c#L211

#12 0x5569c35419e8 in camlStdlib__obj__set_key_306 ()
#13 0x5569c358da66 in camlStdlib__ephemeron__create_1033 ()

^-- OCaml functions Stdlib.Obj.set_key & Stdlib.Ephemeron.create

#14 0x7ffd45de5508 in ?? ()
#15 0x7ffd45de5500 in ?? ()
#16 0x7f744b7cf7b0 in ?? ()
#17 0x30124a25 in ?? ()
#18 0x7f7453745e88 in ?? ()
#19 0x0225 in ?? ()
#20 0x4917d3992e9b8500 in ?? ()
#21 0x5569c3a3c858 in camlCsymtable__9 ()
#22 0x in ?? ()

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Coq build issues in F32

2020-03-25 Thread Stephen J. Turnbull
Hi Jerry!  Long time no see.

Jerry James writes:

 > I built the whole stack for Rawhide with no issues.

Zero compiler warnings?  No fiddling to suppress compiler warnings
*anywhere* in the build chain (whether options or pragmas)?

Not suggesting you go looking for pragmas (probably have 10 million
lines of code :-( ), but you could check for warnings in the build
logs.  I didn't see any in the tail, though.

 > Attempt 1: the x86_64 build segfaulted while running "bin/coqc -coqlib . -q
 > -native-compiler yes   theories/FSets/FMapFullAVL.v", where coqc is a
 > binary built from the coq sources.  Coq is a theorem prover.  The coqc
 > run is essentially checking that a series of proof steps indeed prove
 > a theorem.  The "-native-compiler yes" part means that the OCaml
 > compiler will be invoked to generate native code for parts of this
 > process.

Where is the segfault?  Is it in coqc or is it in the generated code
or is it in the OCaml compiler or is it somewhere else?

If you can't rule out the OCaml compiler (or other parts of the OCaml
toolchain) and the generated code:
- what happens if you use -native-compiler=no?
- How is the native code used?  Compiled and linked as a .so, then
  dynamically loaded and called?  Compiled and linked as an
  executable, then run as a separate process?

If you can't rule out OCaml: does it produce native code directly, or
does it produce C or similar and then use GCC or similar to produce
the native code?

What optimizations to gcc are being used where it's being used?

Are other compilers (LLVM) supported in lieu of GCC for any parts of
the build chain?

 > 
 > I was puzzled by the segfault.  I decided to see if the segfault is
 > deterministic, so I launched...
 > 
 > Attempt 2: the x86_64 build succeeded.  The s390x build failed with an
 > illegal instruction error while running *the same command*.
 > 
 > Attempt 3: The s390x build again failed with an illegal instruction
 > error while running the same command, and aarch64 segfaulted while
 > running the same command.  All other arches succeeded.  I saved the
 > URL for this one:
 > 
 > https://koji.fedoraproject.org/koji/taskinfo?taskID=42768075

From the tail of the S390 build log:

rm -f theories/FSets/FMapFullAVL.glob
bin/coqc -coqlib . -q -native-compiler yes   theories/FSets/FMapFullAVL.v  
make[1]: *** [Makefile.build:880: theories/FSets/FMapFullAVL.vo] Illegal 
instruction (core dumped)
make[1]: *** [theories/FSets/FMapFullAVL.vo] Deleting file 
'theories/FSets/FMapFullAVL.glob'
make[1]: Leaving directory '/builddir/build/BUILD/coq-8.11.0'
make: *** [Makefile:177: submake] Error 2

That seems pretty unhelpful, as Makefile.build:880 is presumably just
the rule that invokes bin/coqc, which evidently is actually a compiler
driver that invokes various executables.  Do you have access to the
core that was allegedly dumped, to at least determine the identity of
the executable that segfaulted?  Is it possible to reproduce outside
of Koji?

Stay healthy!

-- 
Associate Professor  Division of Policy and Planning Science
http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information
Email: turnb...@sk.tsukuba.ac.jp   University of Tsukuba
Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-25 Thread Jerry James
On Wed, Mar 25, 2020 at 9:24 PM Jerry James  wrote:
> You'll want to view this with a fixed-width font.

Thank you, gmail, for completely destroying my formatting work.  Oh
well, you can probably tell what I was trying to do.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-25 Thread Jerry James
On Wed, Mar 25, 2020 at 4:58 PM Jerry James  wrote:
> Either I got lucky when I launched the Rawhide build, or something is
> fundamentally different between F32 and Rawhide.  Clearly something
> nondeterministic is at work.  I have been unable to reproduce this
> failure in mock so far, but will keep trying.

I did a scratch build in Rawhide, just to see if this morning's
successful build was a fluke.  It succeeded on all architectures:

https://koji.fedoraproject.org/koji/taskinfo?taskID=42770190

So far, 2 out of 2 Rawhide builds succeeded, and 3 out of 3 F32 builds
ended with either a segfault or a bad instruction on at least 1
architecture.  Here are the side-by-side differences in the packages
in the aarch64 buildroots (since aarch64 was one of the failing
architectures on my last attempt), Rawhide on the left, F32 on the
right.  You'll want to view this with a fixed-width font.

adwaita-cursor-theme-3.36.0-1.fc33.noarch
adwaita-cursor-theme-3.36.0-1.fc32.noarch
adwaita-icon-theme-3.36.0-1.fc33.noarch
adwaita-icon-theme-3.36.0-1.fc32.noarch
annobin-9.12-3.fc33.aarch64
annobin-9.06-4.fc32.aarch64
at-spi2-atk-2.34.2-1.fc33.aarch64
at-spi2-atk-2.34.2-1.fc32.aarch64
at-spi2-atk-devel-2.34.2-1.fc33.aarch64
at-spi2-atk-devel-2.34.2-1.fc32.aarch64
at-spi2-core-2.36.0-1.fc33.aarch64
at-spi2-core-2.36.0-1.fc32.aarch64
at-spi2-core-devel-2.36.0-1.fc33.aarch64
at-spi2-core-devel-2.36.0-1.fc32.aarch64
audit-libs-3.0-0.19.20191104git1c2f876.fc33.aarch64
audit-libs-3.0-0.18.20191104git1c2f876.fc32.aarch64
avahi-libs-0.8-2.fc33.aarch64
avahi-libs-0.7-23.fc32.aarch64
binutils-2.34-2.fc33.aarch64
binutils-2.34-2.fc32.aarch64
binutils-gold-2.34-2.fc33.aarch64
binutils-gold-2.34-2.fc32.aarch64
cmake-filesystem-3.17.0-1.fc33.aarch64
cmake-filesystem-3.16.4-1.fc32.aarch64
coreutils-8.32-3.fc33.aarch64
coreutils-8.31-10.fc32.aarch64
coreutils-common-8.32-3.fc33.aarch64
coreutils-common-8.31-10.fc32.aarch64
cpp-10.0.1-0.9.fc33.aarch64
cpp-10.0.1-0.9.fc32.aarch64
crypto-policies-20200312-1.git3ae59d2.fc33.noarch
crypto-policies-20191128-5.gitcd267a5.fc32.noarch
cryptsetup-libs-2.3.1-1.fc33.aarch64
cryptsetup-libs-2.3.0-1.fc32.aarch64
curl-7.69.0-2.fc33.aarch64
curl-7.69.1-1.fc32.aarch64
cyrus-sasl-lib-2.1.27-4.fc33.aarch64
cyrus-sasl-lib-2.1.27-3.fc32.aarch64
dbus-1:1.12.16-5.fc33.aarch64
dbus-1:1.12.16-4.fc32.aarch64
dbus-broker-22-1.fc33.aarch64
dbus-broker-22-1.fc32.aarch64
dbus-common-1:1.12.16-5.fc33.noarch
dbus-common-1:1.12.16-4.fc32.noarch
dbus-devel-1:1.12.16-5.fc33.aarch64
dbus-devel-1:1.12.16-4.fc32.aarch64
dbus-libs-1:1.12.16-5.fc33.aarch64
dbus-libs-1:1.12.16-4.fc32.aarch64
dejavu-sans-fonts-2.37-7.fc33.noarch
dejavu-sans-fonts-2.37-7.fc32.noarch
fedora-gpg-keys-33-0.3.noarch
fedora-gpg-keys-32-0.7.noarch
fedora-release-33-0.3.noarch
fedora-release-32-0.8.noarch
fedora-release-common-33-0.3.noarch
fedora-release-common-32-0.8.noarch
fedora-repos-33-0.3.noarch
fedora-repos-32-0.7.noarch
fedora-repos-rawhide-33-0.3.noarch
file-5.38-3.fc33.aarch64
file-5.38-2.fc32.aarch64
file-libs-5.38-3.fc33.aarch64
file-libs-5.38-2.fc32.aarch64
fontconfig-2.13.92-8.fc33.aarch64
fontconfig-2.13.92-8.fc32.aarch64
fontconfig-devel-2.13.92-8.fc33.aarch64
fontconfig-devel-2.13.92-8.fc32.aarch64
fonts-filesystem-2.0.3-1.fc33.noarch
fonts-filesystem-2.0.3-1.fc32.noarch
fonts-srpm-macros-2.0.3-1.fc33.noarch
fonts-srpm-macros-2.0.3-1.fc32.noarch
fribidi-1.0.9-1.fc33.aarch64
fribidi-1.0.9-1.fc32.aarch64
fribidi-devel-1.0.9-1.fc33.aarch64
fribidi-devel-1.0.9-1.fc32.aarch64
gcc-10.0.1-0.9.fc33.aarch64
gcc-10.0.1-0.9.fc32.aarch64
gdb-minimal-9.1-4.fc33.aarch64
gdb-minimal-9.1-3.fc32.aarch64
ghostscript-9.50-1.fc33.aarch64
ghostscript-9.50-1.fc32.aarch64
ghostscript-core-9.50-1.fc33.aarch64
ghostscript-core-9.50-1.fc32.aarch64
ghostscript-tools-fonts-9.50-1.fc33.aarch64
ghostscript-tools-fonts-9.50-1.fc32.aarch64
ghostscript-tools-printing-9.50-1.fc33.aarch64
ghostscript-tools-printing-9.50-1.fc32.aarch64
glib-networking-2.64.0-1.fc33.aarch64
glib-networking-2.64.0-1.fc32.aarch64
glib2-2.64.1-1.fc33.aarch64
glib2-2.64.1-1.fc32.aarch64
glib2-devel-2.64.1-1.fc33.aarch64
glib2-devel-2.64.1-1.fc32.aarch64
glibc-2.31.9000-4.fc33.aarch64
glibc-2.31-1.fc32.aarch64
glibc-common-2.31.9000-4.fc33.aarch64
glibc-common-2.31-1.fc32.aarch64
glibc-devel-2.31.9000-4.fc33.aarch64
glibc-devel-2.31-1.fc32.aarch64
glibc-headers-2.31.9000-4.fc33.aarch64
glibc-headers-2.31-1.fc32.aarch64
glibc-minimal-langpack-2.31.9000-4.fc33.aarch64
glibc-minimal-langpack-2.31-1.fc32.aarch64
gmp-1:6.2.0-1.fc33.aarch64
gmp-1:6.1.2-13.fc32.aarch64
google-droid-sans-fonts-20200215-3.fc33.noarch
google-droid-sans-fonts-20200215-3.fc32.noarch
google-roboto-slab-fonts-1.100263-0.12.20150923git.fc33.noarch
google-roboto-slab-fonts-1.100263-0.11.20150923git.fc32.noarch
groff-base-1.22.3-24.fc33.aarch64
groff-base-1.22.3-21.fc32.aarch64
gsettings-desktop-schemas-3.36.0-1.fc33.aarch64
gsettings-desktop-schemas-3.36.0-1.fc32.aarch64
gtk-update-icon-cache-3.24.14-1.fc33.aarch64

Re: Coq build issues in F32

2020-03-25 Thread Jerry James
On Wed, Mar 25, 2020 at 8:57 PM Paul Dufresne via devel
 wrote:
> ... yeah, I know, might have no link at all with your Coq build... I
> don't know.

Maybe ... but that sure is suspicious.  I recall hearing that OpenJDK
is also suffering from weird segfaults.  H.

Thanks for the input, Paul.  I'll keep investigating.
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Coq build issues in F32

2020-03-25 Thread Paul Dufresne via devel
Le Wed, 25 Mar 2020 16:58:33 -0600,
Jerry James  a écrit :
> Either I got lucky when I launched the Rawhide build, or something is
> fundamentally different between F32 and Rawhide.  Clearly something
> nondeterministic is at work.  I have been unable to reproduce this
> failure in mock so far, but will keep trying.
> 
> If anybody has any ideas, I am all ears.  Thank you,

Well... It ring a bell for me, because I kind of seeing what seems to
be weird bugs in F32... I am kind of suspecting a gcc bug giving some
weird things... but I am not a developer, know a bit of assembler but
not enough to make the link between C code and assembler.

Anyway, one hour a go, I got a segment violation error while doing
updates with dnfdragora... and it seems that the code just jump to a
low address outside of the program:
rh bug #1817269

The other bug that make me suspect something might be wrong with gcc
was a bug that make a segment violation (seems to access a bit lower
address than a .png font file) in a game package that have not been
updated in many years as far as I know:
rh bug #1816471
but in that case, frankly the C code seems a bit hard for me to take
time to try to understand it.

What puzzle me in this case, is why a very old code would have stop to
work... that's why I came to the idea that maybe this is the generated
code itself that changed.

... yeah, I know, might have no link at all with your Coq build... I
don't know.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Coq build issues in F32

2020-03-25 Thread Jerry James
I've been talking to Richard Jones about this privately, and he
suggested that a message to fedora-devel could be helpful.

I've been talking for weeks about updating the entire coq stack,
badgering people into swapping reviews with me, etc.  Today I finally
had all the bits in place to do the updates.  I built the whole stack
for Rawhide with no issues.

The same builds need to happen for F32, where parts of the stack
currently have broken deps.  But I can't get coq to build
successfully, even though it did in Rawhide.  It went like this:

Attempt 1: the x86_64 build segfaulted while running "bin/coqc -coqlib . -q
-native-compiler yes   theories/FSets/FMapFullAVL.v", where coqc is a
binary built from the coq sources.  Coq is a theorem prover.  The coqc
run is essentially checking that a series of proof steps indeed prove
a theorem.  The "-native-compiler yes" part means that the OCaml
compiler will be invoked to generate native code for parts of this
process.

I was puzzled by the segfault.  I decided to see if the segfault is
deterministic, so I launched...

Attempt 2: the x86_64 build succeeded.  The s390x build failed with an
illegal instruction error while running *the same command*.

Attempt 3: The s390x build again failed with an illegal instruction
error while running the same command, and aarch64 segfaulted while
running the same command.  All other arches succeeded.  I saved the
URL for this one:

https://koji.fedoraproject.org/koji/taskinfo?taskID=42768075

Either I got lucky when I launched the Rawhide build, or something is
fundamentally different between F32 and Rawhide.  Clearly something
nondeterministic is at work.  I have been unable to reproduce this
failure in mock so far, but will keep trying.

If anybody has any ideas, I am all ears.  Thank you,
-- 
Jerry James
http://www.jamezone.org/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org