Re: [Mono-dev] mono-3.2.1 make check failures sgen assertion

2013-08-19 Thread Charles Randall
Continuing this investigation, I ran the following using the mono-3-2 branch as 
of a0fc6ba35b7454425b8ec772b2652730b8030a52.

I couldn't run a top-level make check because of this bug,

https://bugzilla.xamarin.com/show_bug.cgi?id=14049

Because of this limitation, I ran make check in the mono directory.

In 117 failures in 441 iterations (26.5%). Here's the count of the tests that 
failed,

103 gsharing-valuetype-layout.exe
  8 sgen-bridge.exe|ms-conc
  2 gc-altstack.exe
  1 sgen-weakref-stress.exe|ms-par
  1 sgen-case-23400.exe|ms-par
  1 sgen-bridge.exe|plain
  1 bug-10127.exe

Looking only at the 103 failures of gsharing-valuetype-layout.exe, in 81 of the 
failures were the 120 second test timeout. A successful run of this test takes 
less than a second. In the timeout case, mono simply appears to hang.

Running this manually, when it hangs it stops using CPU and strace reports,

# strace -fp 4289
Process 4289 attached with 3 threads
[pid  4292] futex(0x7f626420, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
[pid  4290] futex(0x967340, FUTEX_WAIT_PRIVATE, 0, NULL unfinished ...
[pid  4289] futex(0x1c84f1c, FUTEX_WAIT_PRIVATE, 3, NULL

Here's a gstack stack trace,

http://sprunge.us/CfjX

This is trivial to reproduce on my system,

# uname -a
Linux linux-mono.com 3.7.10-1.1-desktop #1 SMP PREEMPT Thu Feb 28 15:06:29 
UTC 2013 (82d3f21) x86_64 x86_64 x86_64 GNU/Linux

Running as a VMware virtual machine, 4 CPU, 8 GB RAM.

I use this simple script to repeatedly run the commands,

http://sprunge.us/VKTS

E.g.,

./repeat.sh mono gsharing-valuetype-layout.exe

Filed bug 14073 to track this,

https://bugzilla.xamarin.com/show_bug.cgi?id=14073

Looking back at previous failures, I realize that this hang can be worked 
around by disabling AOT using the mono option '-O=-aot. Ugh. Given that, this 
may be the same as bug 7564,

https://bugzilla.xamarin.com/show_bug.cgi?id=7564

-Charles

-Original Message-
From: Charles Randall 
Sent: Wednesday, August 14, 2013 2:36 PM
To: mono-devel-list@lists.ximian.com
Subject: RE: [Mono-dev] mono-3.2.1 make check failures  sgen assertion

Continuing to dig into these failures, here is what I've found so far.

The majority of the bug-10127 test failures were due to bug 13604 and now 
resolves the assert in sgen-os-posix.c:60 and is already in the mono 3.2 branch 
and should be included in the upcoming mono 3.2.2.
 
https://bugzilla.xamarin.com/show_bug.cgi?id=13604

The failures in sgen-weakref-stress were resolved in this fix which is planned 
to be in the upcoming 3.2.2,

https://github.com/mono/mono/commit/aef4b77ea79aa0a4c06e10bd5842da9df0d10973

The majority of delegate2 test failures are due to bug 7564. There is a 
workaround for this listed in the bug report.

https://bugzilla.xamarin.com/show_bug.cgi?id=7564

That bug is pretty disturbing. Once you've determined you need the workaround, 
your application has already hung. If your application is critical to your 
business that's a tough lesson to learn.

The discrepancy between my observed make check failures and the all green 
results of the monkey wrench automated tests appears to be because many tests 
are disabled for monkey wrench. See DISABLED_TESTS_WRENCH in 
mono/tests/Makefile for the details. Notably, bug-10127 is disabled.

Continuing these tests with the 3.2 branch and the second fix above.

-Charles


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono-3.2.1 make check failures sgen assertion

2013-08-19 Thread Rodrigo Kumpera
Those hangs are known limitations of our AOT technology for which we have
no plans on fixing in the near term.


On Mon, Aug 19, 2013 at 10:49 AM, Charles Randall 
charles.rand...@nirvanix.com wrote:

 Continuing this investigation, I ran the following using the mono-3-2
 branch as of a0fc6ba35b7454425b8ec772b2652730b8030a52.

 I couldn't run a top-level make check because of this bug,

 https://bugzilla.xamarin.com/show_bug.cgi?id=14049

 Because of this limitation, I ran make check in the mono directory.

 In 117 failures in 441 iterations (26.5%). Here's the count of the tests
 that failed,

 103 gsharing-valuetype-layout.exe
   8 sgen-bridge.exe|ms-conc
   2 gc-altstack.exe
   1 sgen-weakref-stress.exe|ms-par
   1 sgen-case-23400.exe|ms-par
   1 sgen-bridge.exe|plain
   1 bug-10127.exe

 Looking only at the 103 failures of gsharing-valuetype-layout.exe, in 81
 of the failures were the 120 second test timeout. A successful run of this
 test takes less than a second. In the timeout case, mono simply appears to
 hang.

 Running this manually, when it hangs it stops using CPU and strace reports,

 # strace -fp 4289
 Process 4289 attached with 3 threads
 [pid  4292] futex(0x7f626420, FUTEX_WAIT_PRIVATE, 2, NULL unfinished
 ...
 [pid  4290] futex(0x967340, FUTEX_WAIT_PRIVATE, 0, NULL unfinished ...
 [pid  4289] futex(0x1c84f1c, FUTEX_WAIT_PRIVATE, 3, NULL

 Here's a gstack stack trace,

 http://sprunge.us/CfjX

 This is trivial to reproduce on my system,

 # uname -a
 Linux linux-mono.com 3.7.10-1.1-desktop #1 SMP PREEMPT Thu Feb 28
 15:06:29 UTC 2013 (82d3f21) x86_64 x86_64 x86_64 GNU/Linux

 Running as a VMware virtual machine, 4 CPU, 8 GB RAM.

 I use this simple script to repeatedly run the commands,

 http://sprunge.us/VKTS

 E.g.,

 ./repeat.sh mono gsharing-valuetype-layout.exe

 Filed bug 14073 to track this,

 https://bugzilla.xamarin.com/show_bug.cgi?id=14073

 Looking back at previous failures, I realize that this hang can be worked
 around by disabling AOT using the mono option '-O=-aot. Ugh. Given that,
 this may be the same as bug 7564,

 https://bugzilla.xamarin.com/show_bug.cgi?id=7564

 -Charles

 -Original Message-
 From: Charles Randall
 Sent: Wednesday, August 14, 2013 2:36 PM
 To: mono-devel-list@lists.ximian.com
 Subject: RE: [Mono-dev] mono-3.2.1 make check failures  sgen assertion

 Continuing to dig into these failures, here is what I've found so far.

 The majority of the bug-10127 test failures were due to bug 13604 and now
 resolves the assert in sgen-os-posix.c:60 and is already in the mono 3.2
 branch and should be included in the upcoming mono 3.2.2.

 https://bugzilla.xamarin.com/show_bug.cgi?id=13604

 The failures in sgen-weakref-stress were resolved in this fix which is
 planned to be in the upcoming 3.2.2,


 https://github.com/mono/mono/commit/aef4b77ea79aa0a4c06e10bd5842da9df0d10973

 The majority of delegate2 test failures are due to bug 7564. There is a
 workaround for this listed in the bug report.

 https://bugzilla.xamarin.com/show_bug.cgi?id=7564

 That bug is pretty disturbing. Once you've determined you need the
 workaround, your application has already hung. If your application is
 critical to your business that's a tough lesson to learn.

 The discrepancy between my observed make check failures and the all
 green results of the monkey wrench automated tests appears to be because
 many tests are disabled for monkey wrench. See DISABLED_TESTS_WRENCH in
 mono/tests/Makefile for the details. Notably, bug-10127 is disabled.

 Continuing these tests with the 3.2 branch and the second fix above.

 -Charles


 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono-3.2.1 make check failures sgen assertion

2013-08-14 Thread Charles Randall
Continuing to dig into these failures, here is what I've found so far.

The majority of the bug-10127 test failures were due to bug 13604 and now 
resolves the assert in sgen-os-posix.c:60 and is already in the mono 3.2 branch 
and should be included in the upcoming mono 3.2.2.
 
https://bugzilla.xamarin.com/show_bug.cgi?id=13604

The failures in sgen-weakref-stress were resolved in this fix which is planned 
to be in the upcoming 3.2.2,

https://github.com/mono/mono/commit/aef4b77ea79aa0a4c06e10bd5842da9df0d10973

The majority of delegate2 test failures are due to bug 7564. There is a 
workaround for this listed in the bug report.

https://bugzilla.xamarin.com/show_bug.cgi?id=7564

That bug is pretty disturbing. Once you've determined you need the workaround, 
your application has already hung. If your application is critical to your 
business that's a tough lesson to learn.

The discrepancy between my observed make check failures and the all green 
results of the monkey wrench automated tests appears to be because many tests 
are disabled for monkey wrench. See DISABLED_TESTS_WRENCH in 
mono/tests/Makefile for the details. Notably, bug-10127 is disabled.

Continuing these tests with the 3.2 branch and the second fix above.

-Charles


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono-3.2.1 make check failures sgen assertion

2013-08-09 Thread Rodrigo Kumpera
There's probably a bug in there on mono, someone probably need to look and
fix it.


On Thu, Aug 8, 2013 at 6:53 PM, Charles Randall 
charles.rand...@nirvanix.com wrote:

 Mono developers,

 While trying to track down a mono internal problem related to signals and
 garbage collection, I've been doing some testing with the latest 3.2.1
 release.

 In an attempt to find a test case that's most interesting to this team,
 I'm running OpenSuse 12.3 and repeatedly unpacking 3.2.1, running
 configure, make, and make check. I let this run for ~24 hours which
 resulted in 58 builds/checks. Every one failed one test or another in make
 check. This is in stark contrast to the status reported by monkey wrench
 for mono-dist-3.2.1-release on OpenSuse (all green).

 I'm new to OpenSuse, but I just did a fresh install and zypper -n in -t
 pattern devel_C_C++ to get a development environment. Other than that, I'm
 just running the Makefile appended below over and over again.

 My system is,

 # cat /etc/SuSE-release
 openSUSE 12.3 (x86_64)
 VERSION = 12.3
 CODENAME = Dartmouth
 # uname -a
 Linux linux-mono.nirvanix.com 3.7.10-1.1-desktop #1 SMP PREEMPT Thu Feb
 28 15:06:29 UTC 2013 (82d3f21) x86_64 x86_64 x86_64 GNU/Linux

 The mono I end up with is,

 # mono --version
 Mono JIT compiler version 3.2.1 (tarball Tue Aug  6 14:43:27 MDT 2013)
 Copyright (C) 2002-2012 Novell, Inc, Xamarin Inc and Contributors.
 www.mono-project.com
 TLS:   __thread
 SIGSEGV:   altstack
 Notifications: epoll
 Architecture:  amd64
 Disabled:  none
 Misc:  softdebug
 LLVM:  supported, not enabled.
 GC:sgen

 Here's a count of the failures from those runs,

  25 bug-10127.exe
  13 gsharing-valuetype-layout.exe
   4 sgen-weakref-stress.exe|ms-par
   3 sgen-weakref-stress.exe|ms-split
   3 sgen-weakref-stress.exe|ms-conc
   2 sgen-weakref-stress.exe|plain
   2 delegate2.exe
   1 sgen-weakref-stress.exe|ms-split-95
   1 sgen-weakref-stress.exe|ms-conc-split
   1 sgen-bridge.exe|ms-split
   1 appdomain-unload.exe

 Note that the total number of test failures is greater than the 58
 iterations because sometimes more than one test failed per iteration. I
 didn't dig into the failures, but note that bug-10127.exe fails on 43% of
 the runs (25/58).

 I'm most interested in assertion failures in the bug-10127.exe failures as
 they look similar to my application failures on another platform.
 Specifically, here's a manual recompile and run of that test (it doesn't
 fail every time),

 # mcs bug-10127.cs
 # mono bug-10127.exe
 Starting cache testers
 * Assertion at sgen-os-posix.c:60, condition `info-doing_handshake' not
 met ...
 =
 Got a SIGABRT while executing native code. This usually indicates a fatal
 error in the mono runtime or one of the native libraries used by your
 application.
 =

 Here are a few examples of the bug-10127.exe failure stack traces from
 manual runs as described above,

 http://sprunge.us/iHFX
 http://sprunge.us/cOEU
 http://sprunge.us/VKRg

 For completeness, the only thing that I can think of that may be different
 about my very simple configuration is that my OpenSuse system is a virtual
 machine (4 core, 4 GB RAM) running on VMware ESXi. I suspect that this is
 subtly altering the timing of execution and exposing latent bugs.

 These appears to be related,

 https://github.com/mono/mono/pull/720


 http://stackoverflow.com/questions/17937222/mono-3-2-0-process-crashes-on-sgen-os-posix-info-handshake-not-met

 From what I've described, am I doing anything wrong? Anyone else seeing
 something similar?

 -Charles

 --- snip ---
 MONO_VER=3.2.1
 MONO_DIST=mono-${MONO_VER}.tar.bz2
 MONO_DIR=mono-${MONO_VER}

 all: check.done

 extract.done:
 @echo 
 @echo EXTRACT
 @echo 
 tar jxvf ${MONO_DIST} 21
 touch extract.done

 configure.done: extract.done
 @echo 
 @echo CONFIGURE
 @echo 
 (cd ${MONO_DIR}  ./configure --prefix=/tmp/mono) 21
 touch configure.done

 build.done: configure.done
 @echo 
 @echo BUILD
 @echo 
 make -C ${MONO_DIR} -j 4 21
 touch build.done

 check.done: build.done
 @echo 
 @echo CHECK
 @echo 
 make -C ${MONO_DIR} check 

Re: [Mono-dev] mono-3.2.1 make check failures sgen assertion

2013-08-09 Thread Rodrigo Kumpera
Hi Charles,

The weakref stress have been fixed in master mono and will be part of 3.3.0.

Could you post crash logs for the other crashers?

In particular:


 25 bug-10127.exe
 13 gsharing-valuetype-layout.exe
  2 delegate2.exe
  1 sgen-bridge.exe|ms-split
  1 appdomain-unload.exe



On Fri, Aug 9, 2013 at 11:29 AM, Rodrigo Kumpera kump...@gmail.com wrote:

 This is been tracked in xamarin's bugzilla:
 https://bugzilla.xamarin.com/show_bug.cgi?id=13604


 On Fri, Aug 9, 2013 at 11:28 AM, Rodrigo Kumpera kump...@gmail.comwrote:

 There's probably a bug in there on mono, someone probably need to look
 and fix it.


 On Thu, Aug 8, 2013 at 6:53 PM, Charles Randall 
 charles.rand...@nirvanix.com wrote:

 Mono developers,

 While trying to track down a mono internal problem related to signals
 and garbage collection, I've been doing some testing with the latest 3.2.1
 release.

 In an attempt to find a test case that's most interesting to this team,
 I'm running OpenSuse 12.3 and repeatedly unpacking 3.2.1, running
 configure, make, and make check. I let this run for ~24 hours which
 resulted in 58 builds/checks. Every one failed one test or another in make
 check. This is in stark contrast to the status reported by monkey wrench
 for mono-dist-3.2.1-release on OpenSuse (all green).

 I'm new to OpenSuse, but I just did a fresh install and zypper -n in -t
 pattern devel_C_C++ to get a development environment. Other than that, I'm
 just running the Makefile appended below over and over again.

 My system is,

 # cat /etc/SuSE-release
 openSUSE 12.3 (x86_64)
 VERSION = 12.3
 CODENAME = Dartmouth
 # uname -a
 Linux linux-mono.nirvanix.com 3.7.10-1.1-desktop #1 SMP PREEMPT Thu Feb
 28 15:06:29 UTC 2013 (82d3f21) x86_64 x86_64 x86_64 GNU/Linux

 The mono I end up with is,

 # mono --version
 Mono JIT compiler version 3.2.1 (tarball Tue Aug  6 14:43:27 MDT 2013)
 Copyright (C) 2002-2012 Novell, Inc, Xamarin Inc and Contributors.
 www.mono-project.com
 TLS:   __thread
 SIGSEGV:   altstack
 Notifications: epoll
 Architecture:  amd64
 Disabled:  none
 Misc:  softdebug
 LLVM:  supported, not enabled.
 GC:sgen

 Here's a count of the failures from those runs,

  25 bug-10127.exe
  13 gsharing-valuetype-layout.exe
   4 sgen-weakref-stress.exe|ms-par
   3 sgen-weakref-stress.exe|ms-split
   3 sgen-weakref-stress.exe|ms-conc
   2 sgen-weakref-stress.exe|plain
   2 delegate2.exe
   1 sgen-weakref-stress.exe|ms-split-95
   1 sgen-weakref-stress.exe|ms-conc-split
   1 sgen-bridge.exe|ms-split
   1 appdomain-unload.exe

 Note that the total number of test failures is greater than the 58
 iterations because sometimes more than one test failed per iteration. I
 didn't dig into the failures, but note that bug-10127.exe fails on 43% of
 the runs (25/58).

 I'm most interested in assertion failures in the bug-10127.exe failures
 as they look similar to my application failures on another platform.
 Specifically, here's a manual recompile and run of that test (it doesn't
 fail every time),

 # mcs bug-10127.cs
 # mono bug-10127.exe
 Starting cache testers
 * Assertion at sgen-os-posix.c:60, condition `info-doing_handshake' not
 met ...
 =
 Got a SIGABRT while executing native code. This usually indicates a
 fatal error in the mono runtime or one of the native libraries used by your
 application.
 =

 Here are a few examples of the bug-10127.exe failure stack traces from
 manual runs as described above,

 http://sprunge.us/iHFX
 http://sprunge.us/cOEU
 http://sprunge.us/VKRg

 For completeness, the only thing that I can think of that may be
 different about my very simple configuration is that my OpenSuse system is
 a virtual machine (4 core, 4 GB RAM) running on VMware ESXi. I suspect that
 this is subtly altering the timing of execution and exposing latent bugs.

 These appears to be related,

 https://github.com/mono/mono/pull/720


 http://stackoverflow.com/questions/17937222/mono-3-2-0-process-crashes-on-sgen-os-posix-info-handshake-not-met

 From what I've described, am I doing anything wrong? Anyone else seeing
 something similar?

 -Charles

 --- snip ---
 MONO_VER=3.2.1
 MONO_DIST=mono-${MONO_VER}.tar.bz2
 MONO_DIR=mono-${MONO_VER}

 all: check.done

 extract.done:
 @echo
 
 @echo EXTRACT
 @echo
 
 tar jxvf ${MONO_DIST} 21
 touch extract.done

 configure.done: extract.done
 @echo
 
 @echo CONFIGURE
 @echo
 
 (cd ${MONO_DIR}