What this is: The Mono team has a CI (continuous integration) system which 
builds and runs automated tests on every commit checked in to git (specifically 
the master branch). We have a test log 
viewer<https://jenkins.mono-project.com/view/All/job/jenkins-testresult-viewer/Test_Result_View/>
 on Jenkins that tracks the results (currently only accessible to github 
project admins, sorry). Once a week I sweep through and write an email with a 
list of the most frequently-failing automated tests. This is both so that 
everyone on the team is aware of our current stability level, and so that when 
people see failures in the github PR tests they know whether to treat them as 
known bugs or new failures. In the interest of making our development process 
more open, I’ve started crossposting this weekly email on the public mailing 
list.

Stability in the C9/master tests is pretty bad lately and we’ve not been 
producing many green builds. The top test failures are the same as they’ve been 
the last couple weeks (and mostly things that aren’t user-impacting— they only 
happen on the builder, or happen during process shutdown) but there are some 
new, less frequent bugs that are very worrisome (please do not miss the bugs 
from 4 on, these bugs are new and some do not have owners).

The top recurring failures currently ruining Jenkins builds are:

1. MonoTests.System.Net.Sockets.SocketTest.SendAsyncFile

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43172 , currently 
assigned to Marcos Heinrich.

This has been failing for a pretty long time. It only occurs on Linux but on 
Linux it fails over 20% of the time. (It has also been seen on Android.) It is 
possible this is only an issue in CI (see akoeplinger note in bug).

The failure is consistent and looks like:


                                                MESSAGE:
                                                System.Exception : Could not 
abort registered blocking threads before closing socket.
Thread StackTrace:
  at System.Net.Sockets.SafeSocketHandle.RegisterForBlockingSyscall () 
[0x00057] in 
/mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/SafeSocketHandle.cs:114
  at System.Net.Sockets.Socket.SendFile_internal 
(System.Net.Sockets.SafeSocketHandle safeHandle, System.String filename, 
System.Byte[] pre_buffer, System.Byte[] post_buffer, 
System.Net.Sockets.TransmitFileOptions flags) [0x00000] in 
/mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2944
  at System.Net.Sockets.Socket.SendFile (System.String fileName, System.Byte[] 
preBuffer, System.Byte[] postBuffer, System.Net.Sockets.TransmitFileOptions 
flags) [0x00028] in 
/mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2893

[snip]

Examples:

https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=ubuntu-1404-amd64/556/testReport/MonoTests.System.Net.Sockets/SocketTest/SendAsyncFile/https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=ubuntu-1404-i386/558/testReport/MonoTests.System.Net.Sockets/SocketTest/SendAsyncFile/

1.5. MonoTests.Remoting.RemotingServicesTest.MarshalThrowException

On ARM64 only, when this test calls ChannelServices.UnregisterChannel(), 
sometimes a KeyNotFoundException is generated somewhere in the guts of 
Socket.Close. This is filed as 
https://bugzilla.xamarin.com/show_bug.cgi?id=43727 . It is possible this is the 
same issue as #2 above (see akoeplinger note in bug).

Examples:

https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-arm64/641/testReport/MonoTests.Remoting/RemotingServicesTest/MarshalThrowException/
https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-arm64/636/testReport/MonoTests.Remoting/RemotingServicesTest/MarshalThrowException/

2. ThreadAbortException in System.Threading.Timer+Scheduler.SchedulerThread
Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43320 , currently 
assigned to Rodrigo.

This occurs in many different places but the crash message always looks the 
same. It is believed to be existing bad behavior brought into the light by 
recent fixes by Vargaz around finalizers and VM shutdown.


Unhandled Exception:

System.TypeInitializationException: The type initializer for 
'System.Collections.Generic.List`1' threw an exception. ---> 
System.Threading.ThreadAbortException

   --- End of inner exception stack trace ---

  at System.Threading.Timer+Scheduler.SchedulerThread () [0x0000f] in <filename 
unknown>:0

  at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) 
[0x00017] in <filename unknown>:0

  at System.Threading.ExecutionContext.RunInternal 
(System.Threading.ExecutionContext executionContext, 
System.Threading.ContextCallback callback, System.Object state, System.Boolean 
preserveSyncCtx) [0x0008d] in <filename unknown>:0

  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext 
executionContext, System.Threading.ContextCallback callback, System.Object 
state, System.Boolean preserveSyncCtx) [0x00000] in <filename unknown>:0

  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext 
executionContext, System.Threading.ContextCallback callback, System.Object 
state) [0x00031] in <filename unknown>:0

  at System.Threading.ThreadHelper.ThreadStart () [0x0000b] in <filename 
unknown>:0

[MVID] 0deb57f9de664ff681556c641423618d 0,1,2,3,4,5

[ERROR] FATAL UNHANDLED EXCEPTION: Nested exception trying to figure out what 
went wrong


Some places this failure is seen include 
MonoTests.gshared.generic-marshalbyref.2.exe, MonoTests.runtime.bug-415577.exe, 
and as an unknown-test failure when a test suite (such as mcs/class/corlib) is 
shutting down.

Examples:

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4606/testReport/MonoTests/gshared/generic_marshalbyref_2_exe_3/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4607/testReport/MonoTests/gshared/generic_marshalbyref_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4608/testReport/MonoTests/runtime/bug_415577_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4656/parsed_console/log_content.html#WARNING1
 (test shutdown)

3. __icall_wrapper_mono_gc_alloc_vector crash

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43921 , currently 
assigned to Aleksey.

There are two problems here:
1 There appears to be a race condition in coop, Aleksey is looking at this
2 There appears to be a problem where we are not scanning pointers in SIMD 
registers. If a memory copy, such as the one in 
__icall_wrapper_mono_gc_alloc_vector, happens to use a SIMD register, and the 
copy is interrupted by a GC, it will lead to memory corruption. This issue is 
being targeted by https://github.com/mono/mono/pull/3364 , which is still under 
development.
The symptom we see is SIGSEGVs in a range of tests related to domain unloading, 
or thread creation around the same time as the GC stopping the world. This 
symptom occurs on Mac only, we think because mac clang is more aggressive and 
is optimizing our memory copy routine to use SIMD instructions.

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4742/testReport/MonoTests/sgen-regular-tests-ms-split-95/sgen_domain_unload_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4744/testReport/MonoTests/sgen-regular-tests-ms-split-clear-at-gc/sgen_new_threads_dont_join_stw_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4812/parsed_console/log_content.html#WARNING2

3.5 (?). AppDomain.internalUnload crash

This is also mac-only; Aleksey is looking into whether it is the same failure 
as #X.

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4812/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4811/<https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4811/parsed_console/log_content.html#WARNING1>
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4813/testReport/MonoTests/sgen-regular-tests-plain/sgen_domain_unload_exe_timedout/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4812/testReport/
 (both failures)
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4811/testReport/MonoTests/runtime/remoting4_exe_timedout/
Crashes, managed stack looks like:

  at (wrapper managed-to-native) System.AppDomain.InternalUnload (int) <0x00012>
  at System.AppDomain.Unload (System.AppDomain) [0x00011] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-i386/mcs/class/corlib/System/AppDomain.cs:1200
  at MonoTests.System.AppDomainTest.TearDown () [0x0000b] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-i386/mcs/class/corlib/Test/System/AppDomainTest.cs:71
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ 
(object,intptr,intptr,intptr) <IL 0x0004f, 0x00092>


...

4. Tarjan GC bridge crashing for armel binaries on ARM64 host

sgen-bridge.exe and sgen-bridge-major-fragmentation.exe, which run with a 
simulated version of the Android GC bridge, have in the last week started 
segfaulting about 1/3 of the time on the ARM soft float build (but never on any 
other platform). I think the crashes are 1:1 with the soft float binary being 
run on a ARM64 builder (this only happens some percentage of the time, it 
depends on what’s available on the load balancer). The stacks are consistent.

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=44397 , currently 
assigned to me.

5. Crash doing thread join while closing process after ServiceModel tests

Both crashes and hangs have been seen recently while running the ServiceModel 
test suites. This has been seen on both Mac and Linux (I think it likes to 
crash on mac and hang on Linux?). Nothing is filed. When the crash occurs, it 
happens in the test runner itself, waiting for the tests to finish.

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4808/parsed_console/log_content.html#WARNING1
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4794/parsed_console/log_content.html#WARNING2
https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-arm64/739/parsed_console/log_content.html#WARNING1

Managed stack looks like:

  at (wrapper managed-to-native) System.Threading.Thread.JoinInternal 
(System.Threading.Thread,int) <IL 0x00014, 0x00067>
  at System.Threading.Thread.Join () [0x00000] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/class/referencesource/mscorlib/system/threading/thread.cs:697
  at NUnit.Core.TestRunnerThread.Wait () [0x00010] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/TestRunnerThread.cs:118
  at NUnit.Core.ThreadedTestRunner.Wait () [0x0000b] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:63
  at NUnit.Core.ThreadedTestRunner.EndRun () [0x00000] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:55
  at NUnit.Core.ThreadedTestRunner.Run 
(NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00008] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:36
  at NUnit.Core.ProxyTestRunner.Run 
(NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00007] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ProxyTestRunner.cs:133
  at NUnit.Core.RemoteTestRunner.Run 
(NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x0002b] in 
/Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/RemoteTestRunner.cs:63


Native stack, when we get one, looks like:

        0   mono                                0x00000001073a7d5a 
mono_handle_native_sigsegv + 282
        1   libsystem_platform.dylib            0x00007fff91ff152a _sigtramp + 
26
        2   ???                                 0x00000001081b9a00 0x0 + 
4430993920
        3   mono                                0x000000010756f763 
mono_os_cond_timedwait + 163
        4   mono                                0x000000010756e326 
mono_w32handle_timedwait_signal_handle + 358
        5   mono                                0x000000010756e0e1 
mono_w32handle_wait_one + 897
        6   mono                                0x00000001075535f9 
wapi_WaitForSingleObjectEx + 9
        7   mono                                0x00000001074a2cfe 
ves_icall_System_Threading_Thread_Join_internal + 174

6. ServiceModel contract tests fail with the wrong exception

In about 4 random tests over a week, all on Linux, a test of the contract 
capability in ServiceModel failed with a ObjectDisposedException where it 
expected a contract-wrong exception. Filed, not assigned but in a Class 
Libraries component so Marek Safar is aware of it: 
https://bugzilla.xamarin.com/show_bug.cgi?id=44650

7. handle_ops[type] exception in w32handle on thread abort

In about 4 random tests over a week, all on Mac, the regression test for bug 
561239 is failing with the assert
"Assertion at w32handle.c:809, condition `handle_ops [type]' not met”
while aborting a thread.

Filed, not assigned: https://bugzilla.xamarin.com/show_bug.cgi?id=44651

8. JIT exception during XSL tests

In about 4 random tests over a week, all on ARM64, one of the XSL tests fails 
with
"Assertion at mini-arm64.c:937, condition `arm_is_bl_disp ((code), (target))' 
not met”
We have a managed stack but not a native one.

Filed, not assigned: https://bugzilla.xamarin.com/show_bug.cgi?id=44659

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list

Reply via email to