Yes, the plan is to make Ignite 2.8 officially compatible with .NET Core 3.0
On Wed, Sep 18, 2019 at 3:17 PM Eduard Llull <[email protected]> wrote: > Hello Pavel, > > I also found the issue about IGNITE_HOME detection when performing the > tests with .NET Core 3.0. To make my reproducer start I had to copy the > jars in the libs/ directory along side the Apache.Ignite.dll but forgot to > mention it. > > When you say that you plan to make the next Ignite release fully > compatible with .NET Core 3.0 you refer to Ignite 2.8 or a future release? > > I will let you know if we encounter any other issues. > > > Best regards. > > On Wed, Sep 18, 2019 at 1:55 PM Pavel Tupitsyn <[email protected]> > wrote: > >> Hi Eduard, >> >> First of all, thank you so much for such a detailed report, this is >> extremely valuable! >> I've updated our troubleshooting guide: >> https://apacheignite-net.readme.io/docs/troubleshooting >> >> Yes, JVM installs it's own signal handlers: >> https://docs.oracle.com/javase/9/troubleshoot/handle-signals-and-exceptions.htm >> This includes SIGSEGV, which is used to handle NullPointerException in >> Java, and it conflicts with similar mechanism in .NET. >> There is -Xrs option to reduce signal usage, but it does not get rid of >> SIGSEGV handler, unfortunately. >> >> As for .NET Core 3.0 - I have it on my machine and I run some Ignite >> tests with it time to time. >> So far the only issue was with IGNITE_HOME detection with NuGet: >> https://issues.apache.org/jira/browse/IGNITE-10554, and it has >> workarounds (copy jar files manually or with a build step). >> Let me know if you encounter anything else with .NET Core 3.0, we plan to >> make the next Ignite release fully compatible with it. >> >> Thanks, >> Pavel >> >> >> >> On Wed, Sep 18, 2019 at 1:48 PM Eduard Llull <[email protected]> wrote: >> >>> Hi everyone, >>> >>> Almost a month ago I claimed that one of our application that use the >>> Ignite.NET thick client we were getting SIGSEGVs and SIGABRT, and changing >>> to the thin client it fixed that problem but the performance was severally >>> impacted [ >>> http://apache-ignite-users.70518.x6.nabble.com/NET-thin-client-multithreaded-td29116.html#a29142]. >>> We suspected that it was related with the fact that the embedded JVM >>> installs it's own signal handlers but we had no evidence. >>> >>> We have been digging into this problem and today we found the cause. It >>> will be a long email. >>> >>> The reproducer is quite simple: >>> using System; >>> using Apache.Ignite.Core; >>> >>> namespace segfault >>> { >>> class Program >>> { >>> static void Main(string[] args) >>> { >>> if (args.Length == 0) >>> { >>> Console.WriteLine("Starting Ignite"); >>> var thick = Ignition.Start(); >>> } >>> else >>> { >>> Console.WriteLine("NOT starting Ignite"); >>> } >>> >>> string s = null; >>> try >>> { >>> s.ToUpper(); >>> } >>> catch (NullReferenceException e) >>> { >>> Console.WriteLine("Catched exception " + e); >>> } >>> } >>> } >>> } >>> >>> If executed as a netcoreapp2.2 application on Linux (tested on ubuntu >>> 19.04, I've havent tested it on Windows), and not passing any argument (it >>> will call the Ignition.Start()), it crashes. >>> >>> $ dotnet run >>> Starting Ignite >>> [12:17:55] __________ ________________ >>> [12:17:55] / _/ ___/ |/ / _/_ __/ __/ >>> [12:17:55] _/ // (7 7 // / / / / _/ >>> [12:17:55] /___/\___/_/|_/___/ /_/ /___/ >>> [12:17:55] >>> [12:17:55] ver. 2.7.5#20190603-sha1:be4f2a15 >>> [12:17:55] 2018 Copyright(C) Apache Software Foundation >>> [12:17:55] >>> [12:17:55] Ignite documentation: http://ignite.apache.org >>> [12:17:55] >>> [12:17:55] Quiet mode. >>> [12:17:55] ^-- Logging by 'JavaLogger [quiet=true, config=null]' >>> [12:17:55] ^-- To see **FULL** console log here add >>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} >>> [12:17:55] >>> [12:17:55] OS: Linux 5.0.0-25-generic amd64 >>> [12:17:55] VM information: OpenJDK Runtime Environment >>> 1.8.0_222-8u222-b10-1ubuntu1~19.04.1-b10 Private Build OpenJDK 64-Bit >>> Server VM 25.222-b10 >>> [12:17:55] Please set system property '-Djava.net.preferIPv4Stack=true' >>> to avoid possible problems in mixed environments. >>> [12:17:55] Initial heap size is 250MB (should be no less than 512MB, use >>> -Xms512m -Xmx512m). >>> [12:17:55] Configured plugins: >>> [12:17:55] ^-- None >>> [12:17:55] >>> [12:17:55] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler >>> [tryStop=false, timeout=0, super=AbstractFailureHandler >>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, >>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]] >>> [12:17:55] Message queue limit is set to 0 which may lead to potential >>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due >>> to message queues growth on sender and receiver sides. >>> [12:17:55] Security status [authentication=off, tls/ssl=off] >>> [12:17:57] Performance suggestions for grid (fix if possible) >>> [12:17:57] To disable, set >>> -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true >>> [12:17:57] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM >>> options) >>> [12:17:57] ^-- Specify JVM heap max size (add >>> '-Xmx<size>[g|G|m|M|k|K]' to JVM options) >>> [12:17:57] ^-- Set max direct memory size if getting 'OOME: Direct >>> buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM >>> options) >>> [12:17:57] ^-- Disable processing of calls to System.gc() (add >>> '-XX:+DisableExplicitGC' to JVM options) >>> [12:17:57] Refer to this page for more performance suggestions: >>> https://apacheignite.readme.io/docs/jvm-and-system-tuning >>> [12:17:57] >>> [12:17:57] To start Console Management & Monitoring run >>> ignitevisorcmd.{sh|bat} >>> [12:17:57] Data Regions Configured: >>> [12:17:57] ^-- default [initSize=256,0 MiB, maxSize=3,1 GiB, >>> persistence=false] >>> [12:17:57] >>> [12:17:57] Ignite node started OK (id=5dd14995) >>> [12:17:57] Topology snapshot [ver=1, locNode=5dd14995, servers=1, >>> clients=0, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=3.5GB] >>> **** stack smashing detected ***: <unknown> terminated* >>> >>> >>> >>> If executed passing any argument (it won't start Ignite) the captured >>> NullReferenceException is printed on the console. >>> >>> $ dotnet run 1 >>> NOT starting Ignite >>> *Catched exception System.NullReferenceException*: Object reference not >>> set to an instance of an object. >>> at segfault.Program.Main(String[] args) in >>> /home/eduard/Development/X-files/segfault-2/Program.cs:line 23 >>> >>> >>> So, our guess about the signal handlers looked right and it was >>> confirmed when we found these issues in the github project of coreclr: >>> >>> 1. Stack Smashing Failures (SIGSEGV) instead of >>> NullReferenceExceptions [ >>> https://github.com/dotnet/coreclr/issues/25166] >>> 2. SIGSEGV is not transformed into NullReferenceException in WSL [ >>> https://github.com/dotnet/coreclr/issues/25945] >>> >>> So, the problem is caused because the NET core CLR uses an alternate >>> stack for handling the sigsegv signal, but when the signal handler >>> registered by the 3rd party native library (libjvm.so) calls the CLR signal >>> handler it is not called with the alternate stack and the CLR signal >>> handler cannot handle that case and the program just exits. >>> >>> It seams solved in the NET core SDK 3.0 (tested executing the >>> application as a netcoreapp3.0 with SDK 3.0.100-rc1-014190) but you have >>> to define the environment variable COMPlus_EnableAlternateStackCheck=1 >>> to enable the alternate stack check [ >>> https://github.com/dotnet/coreclr/issues/25945#issuecomment-517199962] >>> >>> Without the COMPlus_EnableAlternateStackCheck with NET core 3.0 it >>> segfaults: >>> >>> $ grep netcoreapp segfault-2.csproj; dotnet run >>> <TargetFramework>netcoreapp3.0</TargetFramework> >>> Starting Ignite >>> [12:33:38] __________ ________________ >>> [12:33:38] / _/ ___/ |/ / _/_ __/ __/ >>> [12:33:38] _/ // (7 7 // / / / / _/ >>> [12:33:38] /___/\___/_/|_/___/ /_/ /___/ >>> [12:33:38] >>> [12:33:38] ver. 2.7.5#20190603-sha1:be4f2a15 >>> [12:33:38] 2018 Copyright(C) Apache Software Foundation >>> [12:33:38] >>> [12:33:38] Ignite documentation: http://ignite.apache.org >>> [12:33:38] >>> [12:33:38] Quiet mode. >>> [12:33:38] ^-- Logging by 'JavaLogger [quiet=true, config=null]' >>> [12:33:38] ^-- To see **FULL** console log here add >>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} >>> [12:33:38] >>> [12:33:38] OS: Linux 5.0.0-25-generic amd64 >>> [12:33:38] VM information: OpenJDK Runtime Environment >>> 1.8.0_222-8u222-b10-1ubuntu1~19.04.1-b10 Private Build OpenJDK 64-Bit >>> Server VM 25.222-b10 >>> [12:33:39] Please set system property '-Djava.net.preferIPv4Stack=true' >>> to avoid possible problems in mixed environments. >>> [12:33:39] Initial heap size is 250MB (should be no less than 512MB, use >>> -Xms512m -Xmx512m). >>> [12:33:39] Configured plugins: >>> [12:33:39] ^-- None >>> [12:33:39] >>> [12:33:39] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler >>> [tryStop=false, timeout=0, super=AbstractFailureHandler >>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, >>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]] >>> [12:33:39] Message queue limit is set to 0 which may lead to potential >>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due >>> to message queues growth on sender and receiver sides. >>> [12:33:39] Security status [authentication=off, tls/ssl=off] >>> [12:33:40] Performance suggestions for grid (fix if possible) >>> [12:33:40] To disable, set >>> -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true >>> [12:33:40] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM >>> options) >>> [12:33:40] ^-- Specify JVM heap max size (add >>> '-Xmx<size>[g|G|m|M|k|K]' to JVM options) >>> [12:33:40] ^-- Set max direct memory size if getting 'OOME: Direct >>> buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM >>> options) >>> [12:33:40] ^-- Disable processing of calls to System.gc() (add >>> '-XX:+DisableExplicitGC' to JVM options) >>> [12:33:40] Refer to this page for more performance suggestions: >>> https://apacheignite.readme.io/docs/jvm-and-system-tuning >>> [12:33:40] >>> [12:33:40] To start Console Management & Monitoring run >>> ignitevisorcmd.{sh|bat} >>> [12:33:40] Data Regions Configured: >>> [12:33:40] ^-- default [initSize=256,0 MiB, maxSize=3,1 GiB, >>> persistence=false] >>> [12:33:40] >>> [12:33:40] Ignite node started OK (id=711e0976) >>> [12:33:40] Topology snapshot [ver=1, locNode=711e0976, servers=1, >>> clients=0, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=3.5GB] >>> **** stack smashing detected ***: <unknown> terminated* >>> >>> >>> With the COMPlus_EnableAlternateStackCheck the exception is catched: >>> >>> $ grep netcoreapp segfault-2.csproj; >>> *COMPlus_EnableAlternateStackCheck=1* dotnet run >>> <TargetFramework>netcoreapp3.0</TargetFramework> >>> Starting Ignite >>> [12:35:20] __________ ________________ >>> [12:35:20] / _/ ___/ |/ / _/_ __/ __/ >>> [12:35:20] _/ // (7 7 // / / / / _/ >>> [12:35:20] /___/\___/_/|_/___/ /_/ /___/ >>> [12:35:20] >>> [12:35:20] ver. 2.7.5#20190603-sha1:be4f2a15 >>> [12:35:20] 2018 Copyright(C) Apache Software Foundation >>> [12:35:20] >>> [12:35:20] Ignite documentation: http://ignite.apache.org >>> [12:35:20] >>> [12:35:20] Quiet mode. >>> [12:35:20] ^-- Logging by 'JavaLogger [quiet=true, config=null]' >>> [12:35:20] ^-- To see **FULL** console log here add >>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} >>> [12:35:20] >>> [12:35:20] OS: Linux 5.0.0-25-generic amd64 >>> [12:35:20] VM information: OpenJDK Runtime Environment >>> 1.8.0_222-8u222-b10-1ubuntu1~19.04.1-b10 Private Build OpenJDK 64-Bit >>> Server VM 25.222-b10 >>> [12:35:20] Please set system property '-Djava.net.preferIPv4Stack=true' >>> to avoid possible problems in mixed environments. >>> [12:35:20] Initial heap size is 250MB (should be no less than 512MB, use >>> -Xms512m -Xmx512m). >>> [12:35:21] Configured plugins: >>> [12:35:21] ^-- None >>> [12:35:21] >>> [12:35:21] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler >>> [tryStop=false, timeout=0, super=AbstractFailureHandler >>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, >>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]] >>> [12:35:21] Message queue limit is set to 0 which may lead to potential >>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due >>> to message queues growth on sender and receiver sides. >>> [12:35:21] Security status [authentication=off, tls/ssl=off] >>> [12:35:22] Performance suggestions for grid (fix if possible) >>> [12:35:22] To disable, set >>> -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true >>> [12:35:22] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM >>> options) >>> [12:35:22] ^-- Specify JVM heap max size (add >>> '-Xmx<size>[g|G|m|M|k|K]' to JVM options) >>> [12:35:22] ^-- Set max direct memory size if getting 'OOME: Direct >>> buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM >>> options) >>> [12:35:22] ^-- Disable processing of calls to System.gc() (add >>> '-XX:+DisableExplicitGC' to JVM options) >>> [12:35:22] Refer to this page for more performance suggestions: >>> https://apacheignite.readme.io/docs/jvm-and-system-tuning >>> [12:35:22] >>> [12:35:22] To start Console Management & Monitoring run >>> ignitevisorcmd.{sh|bat} >>> [12:35:22] Data Regions Configured: >>> [12:35:22] ^-- default [initSize=256,0 MiB, maxSize=3,1 GiB, >>> persistence=false] >>> [12:35:22] >>> [12:35:22] Ignite node started OK (id=841d9bca) >>> [12:35:22] Topology snapshot [ver=1, locNode=841d9bca, servers=1, >>> clients=0, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=3.5GB] >>> C*atched exception System.NullReferenceException*: Object reference not >>> set to an instance of an object. >>> at segfault.Program.Main(String[] args) in >>> /home/eduard/Development/X-files/segfault-2/Program.cs:line 23 >>> >>> >>> Our plan is to change our application to use the NET Core 3.0 and the >>> thick client. We know that it is currently in RC but it's expected to be >>> released on 23th of September and as we will be performing in depth tests >>> to see if there is anything that breaks we expect that the 3.0 will be >>> release by the time we decide to deploy it in production. >>> >>> So, first of all, I wanted to let you know about this issue in case any >>> body gets in the same situation. >>> >>> And finally, do you guys foresee any problem with the migration? >>> >>>
