Yes, the plan is to make Ignite 2.8 officially compatible with .NET Core 3.0

On Wed, Sep 18, 2019 at 3:17 PM Eduard Llull <[email protected]> wrote:

> Hello Pavel,
>
> I also found the issue about IGNITE_HOME detection when performing the
> tests with .NET Core 3.0. To make my reproducer start I had to copy the
> jars in the libs/ directory along side the Apache.Ignite.dll but forgot to
> mention it.
>
> When you say that you plan to make the next Ignite release fully
> compatible with .NET Core 3.0 you refer to Ignite 2.8 or a future release?
>
> I will let you know if we encounter any other issues.
>
>
> Best regards.
>
> On Wed, Sep 18, 2019 at 1:55 PM Pavel Tupitsyn <[email protected]>
> wrote:
>
>> Hi Eduard,
>>
>> First of all, thank you so much for such a detailed report, this is
>> extremely valuable!
>> I've updated our troubleshooting guide:
>> https://apacheignite-net.readme.io/docs/troubleshooting
>>
>> Yes, JVM installs it's own signal handlers:
>> https://docs.oracle.com/javase/9/troubleshoot/handle-signals-and-exceptions.htm
>> This includes SIGSEGV, which is used to handle NullPointerException in
>> Java, and it conflicts with similar mechanism in .NET.
>> There is -Xrs option to reduce signal usage, but it does not get rid of
>> SIGSEGV handler, unfortunately.
>>
>> As for .NET Core 3.0 - I have it on my machine and I run some Ignite
>> tests with it time to time.
>> So far the only issue was with IGNITE_HOME detection with NuGet:
>> https://issues.apache.org/jira/browse/IGNITE-10554, and it has
>> workarounds (copy jar files manually or with a build step).
>> Let me know if you encounter anything else with .NET Core 3.0, we plan to
>> make the next Ignite release fully compatible with it.
>>
>> Thanks,
>> Pavel
>>
>>
>>
>> On Wed, Sep 18, 2019 at 1:48 PM Eduard Llull <[email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> Almost a month ago I claimed that one of our application that use the
>>> Ignite.NET thick client we were getting SIGSEGVs and SIGABRT, and changing
>>> to the thin client it fixed that problem but the performance was severally
>>> impacted [
>>> http://apache-ignite-users.70518.x6.nabble.com/NET-thin-client-multithreaded-td29116.html#a29142].
>>> We suspected that it was related with the fact that the embedded JVM
>>> installs it's own signal handlers but we had no evidence.
>>>
>>> We have been digging into this problem and today we found the cause. It
>>> will be a long email.
>>>
>>> The reproducer is quite simple:
>>> using System;
>>> using Apache.Ignite.Core;
>>>
>>> namespace segfault
>>> {
>>> class Program
>>> {
>>> static void Main(string[] args)
>>> {
>>> if (args.Length == 0)
>>> {
>>> Console.WriteLine("Starting Ignite");
>>> var thick = Ignition.Start();
>>> }
>>> else
>>> {
>>> Console.WriteLine("NOT starting Ignite");
>>> }
>>>
>>> string s = null;
>>> try
>>> {
>>> s.ToUpper();
>>> }
>>> catch (NullReferenceException e)
>>> {
>>> Console.WriteLine("Catched exception " + e);
>>> }
>>> }
>>> }
>>> }
>>>
>>> If executed as a netcoreapp2.2 application on Linux (tested on ubuntu
>>> 19.04, I've havent tested it on Windows), and not passing any argument (it
>>> will call the Ignition.Start()), it crashes.
>>>
>>> $ dotnet run
>>> Starting Ignite
>>> [12:17:55]    __________  ________________
>>> [12:17:55]   /  _/ ___/ |/ /  _/_  __/ __/
>>> [12:17:55]  _/ // (7 7    // /  / / / _/
>>> [12:17:55] /___/\___/_/|_/___/ /_/ /___/
>>> [12:17:55]
>>> [12:17:55] ver. 2.7.5#20190603-sha1:be4f2a15
>>> [12:17:55] 2018 Copyright(C) Apache Software Foundation
>>> [12:17:55]
>>> [12:17:55] Ignite documentation: http://ignite.apache.org
>>> [12:17:55]
>>> [12:17:55] Quiet mode.
>>> [12:17:55]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>>> [12:17:55]   ^-- To see **FULL** console log here add
>>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
>>> [12:17:55]
>>> [12:17:55] OS: Linux 5.0.0-25-generic amd64
>>> [12:17:55] VM information: OpenJDK Runtime Environment
>>> 1.8.0_222-8u222-b10-1ubuntu1~19.04.1-b10 Private Build OpenJDK 64-Bit
>>> Server VM 25.222-b10
>>> [12:17:55] Please set system property '-Djava.net.preferIPv4Stack=true'
>>> to avoid possible problems in mixed environments.
>>> [12:17:55] Initial heap size is 250MB (should be no less than 512MB, use
>>> -Xms512m -Xmx512m).
>>> [12:17:55] Configured plugins:
>>> [12:17:55]   ^-- None
>>> [12:17:55]
>>> [12:17:55] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
>>> [tryStop=false, timeout=0, super=AbstractFailureHandler
>>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
>>> [12:17:55] Message queue limit is set to 0 which may lead to potential
>>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
>>> to message queues growth on sender and receiver sides.
>>> [12:17:55] Security status [authentication=off, tls/ssl=off]
>>> [12:17:57] Performance suggestions for grid  (fix if possible)
>>> [12:17:57] To disable, set
>>> -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
>>> [12:17:57]   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM
>>> options)
>>> [12:17:57]   ^-- Specify JVM heap max size (add
>>> '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
>>> [12:17:57]   ^-- Set max direct memory size if getting 'OOME: Direct
>>> buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM
>>> options)
>>> [12:17:57]   ^-- Disable processing of calls to System.gc() (add
>>> '-XX:+DisableExplicitGC' to JVM options)
>>> [12:17:57] Refer to this page for more performance suggestions:
>>> https://apacheignite.readme.io/docs/jvm-and-system-tuning
>>> [12:17:57]
>>> [12:17:57] To start Console Management & Monitoring run
>>> ignitevisorcmd.{sh|bat}
>>> [12:17:57] Data Regions Configured:
>>> [12:17:57]   ^-- default [initSize=256,0 MiB, maxSize=3,1 GiB,
>>> persistence=false]
>>> [12:17:57]
>>> [12:17:57] Ignite node started OK (id=5dd14995)
>>> [12:17:57] Topology snapshot [ver=1, locNode=5dd14995, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=3.5GB]
>>> **** stack smashing detected ***: <unknown> terminated*
>>>
>>>
>>>
>>> If executed passing any argument (it won't start Ignite) the captured
>>> NullReferenceException is printed on the console.
>>>
>>> $ dotnet run 1
>>> NOT starting Ignite
>>> *Catched exception System.NullReferenceException*: Object reference not
>>> set to an instance of an object.
>>>   at segfault.Program.Main(String[] args) in
>>> /home/eduard/Development/X-files/segfault-2/Program.cs:line 23
>>>
>>>
>>> So, our guess about the signal handlers looked right and it was
>>> confirmed when we found these issues in the github project of coreclr:
>>>
>>>    1. Stack Smashing Failures (SIGSEGV) instead of
>>>    NullReferenceExceptions [
>>>    https://github.com/dotnet/coreclr/issues/25166]
>>>    2. SIGSEGV is not transformed into NullReferenceException in WSL [
>>>    https://github.com/dotnet/coreclr/issues/25945]
>>>
>>> So, the problem is caused because the NET core CLR uses an alternate
>>> stack for handling the sigsegv signal, but when the signal handler
>>> registered by the 3rd party native library (libjvm.so) calls the CLR signal
>>> handler it is not called with the alternate stack and the CLR signal
>>> handler cannot handle that case and the program just exits.
>>>
>>> It seams solved in the NET core SDK 3.0 (tested executing the
>>> application as a netcoreapp3.0  with SDK 3.0.100-rc1-014190) but you have
>>> to define the environment variable COMPlus_EnableAlternateStackCheck=1
>>> to enable the alternate stack check [
>>> https://github.com/dotnet/coreclr/issues/25945#issuecomment-517199962]
>>>
>>> Without the COMPlus_EnableAlternateStackCheck with NET core 3.0 it
>>> segfaults:
>>>
>>> $ grep netcoreapp segfault-2.csproj; dotnet run
>>>    <TargetFramework>netcoreapp3.0</TargetFramework>
>>> Starting Ignite
>>> [12:33:38]    __________  ________________
>>> [12:33:38]   /  _/ ___/ |/ /  _/_  __/ __/
>>> [12:33:38]  _/ // (7 7    // /  / / / _/
>>> [12:33:38] /___/\___/_/|_/___/ /_/ /___/
>>> [12:33:38]
>>> [12:33:38] ver. 2.7.5#20190603-sha1:be4f2a15
>>> [12:33:38] 2018 Copyright(C) Apache Software Foundation
>>> [12:33:38]
>>> [12:33:38] Ignite documentation: http://ignite.apache.org
>>> [12:33:38]
>>> [12:33:38] Quiet mode.
>>> [12:33:38]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>>> [12:33:38]   ^-- To see **FULL** console log here add
>>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
>>> [12:33:38]
>>> [12:33:38] OS: Linux 5.0.0-25-generic amd64
>>> [12:33:38] VM information: OpenJDK Runtime Environment
>>> 1.8.0_222-8u222-b10-1ubuntu1~19.04.1-b10 Private Build OpenJDK 64-Bit
>>> Server VM 25.222-b10
>>> [12:33:39] Please set system property '-Djava.net.preferIPv4Stack=true'
>>> to avoid possible problems in mixed environments.
>>> [12:33:39] Initial heap size is 250MB (should be no less than 512MB, use
>>> -Xms512m -Xmx512m).
>>> [12:33:39] Configured plugins:
>>> [12:33:39]   ^-- None
>>> [12:33:39]
>>> [12:33:39] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
>>> [tryStop=false, timeout=0, super=AbstractFailureHandler
>>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
>>> [12:33:39] Message queue limit is set to 0 which may lead to potential
>>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
>>> to message queues growth on sender and receiver sides.
>>> [12:33:39] Security status [authentication=off, tls/ssl=off]
>>> [12:33:40] Performance suggestions for grid  (fix if possible)
>>> [12:33:40] To disable, set
>>> -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
>>> [12:33:40]   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM
>>> options)
>>> [12:33:40]   ^-- Specify JVM heap max size (add
>>> '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
>>> [12:33:40]   ^-- Set max direct memory size if getting 'OOME: Direct
>>> buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM
>>> options)
>>> [12:33:40]   ^-- Disable processing of calls to System.gc() (add
>>> '-XX:+DisableExplicitGC' to JVM options)
>>> [12:33:40] Refer to this page for more performance suggestions:
>>> https://apacheignite.readme.io/docs/jvm-and-system-tuning
>>> [12:33:40]
>>> [12:33:40] To start Console Management & Monitoring run
>>> ignitevisorcmd.{sh|bat}
>>> [12:33:40] Data Regions Configured:
>>> [12:33:40]   ^-- default [initSize=256,0 MiB, maxSize=3,1 GiB,
>>> persistence=false]
>>> [12:33:40]
>>> [12:33:40] Ignite node started OK (id=711e0976)
>>> [12:33:40] Topology snapshot [ver=1, locNode=711e0976, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=3.5GB]
>>> **** stack smashing detected ***: <unknown> terminated*
>>>
>>>
>>> With the COMPlus_EnableAlternateStackCheck the exception is catched:
>>>
>>> $ grep netcoreapp segfault-2.csproj;
>>> *COMPlus_EnableAlternateStackCheck=1* dotnet run
>>>    <TargetFramework>netcoreapp3.0</TargetFramework>
>>> Starting Ignite
>>> [12:35:20]    __________  ________________
>>> [12:35:20]   /  _/ ___/ |/ /  _/_  __/ __/
>>> [12:35:20]  _/ // (7 7    // /  / / / _/
>>> [12:35:20] /___/\___/_/|_/___/ /_/ /___/
>>> [12:35:20]
>>> [12:35:20] ver. 2.7.5#20190603-sha1:be4f2a15
>>> [12:35:20] 2018 Copyright(C) Apache Software Foundation
>>> [12:35:20]
>>> [12:35:20] Ignite documentation: http://ignite.apache.org
>>> [12:35:20]
>>> [12:35:20] Quiet mode.
>>> [12:35:20]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>>> [12:35:20]   ^-- To see **FULL** console log here add
>>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
>>> [12:35:20]
>>> [12:35:20] OS: Linux 5.0.0-25-generic amd64
>>> [12:35:20] VM information: OpenJDK Runtime Environment
>>> 1.8.0_222-8u222-b10-1ubuntu1~19.04.1-b10 Private Build OpenJDK 64-Bit
>>> Server VM 25.222-b10
>>> [12:35:20] Please set system property '-Djava.net.preferIPv4Stack=true'
>>> to avoid possible problems in mixed environments.
>>> [12:35:20] Initial heap size is 250MB (should be no less than 512MB, use
>>> -Xms512m -Xmx512m).
>>> [12:35:21] Configured plugins:
>>> [12:35:21]   ^-- None
>>> [12:35:21]
>>> [12:35:21] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
>>> [tryStop=false, timeout=0, super=AbstractFailureHandler
>>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
>>> [12:35:21] Message queue limit is set to 0 which may lead to potential
>>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
>>> to message queues growth on sender and receiver sides.
>>> [12:35:21] Security status [authentication=off, tls/ssl=off]
>>> [12:35:22] Performance suggestions for grid  (fix if possible)
>>> [12:35:22] To disable, set
>>> -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
>>> [12:35:22]   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM
>>> options)
>>> [12:35:22]   ^-- Specify JVM heap max size (add
>>> '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
>>> [12:35:22]   ^-- Set max direct memory size if getting 'OOME: Direct
>>> buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM
>>> options)
>>> [12:35:22]   ^-- Disable processing of calls to System.gc() (add
>>> '-XX:+DisableExplicitGC' to JVM options)
>>> [12:35:22] Refer to this page for more performance suggestions:
>>> https://apacheignite.readme.io/docs/jvm-and-system-tuning
>>> [12:35:22]
>>> [12:35:22] To start Console Management & Monitoring run
>>> ignitevisorcmd.{sh|bat}
>>> [12:35:22] Data Regions Configured:
>>> [12:35:22]   ^-- default [initSize=256,0 MiB, maxSize=3,1 GiB,
>>> persistence=false]
>>> [12:35:22]
>>> [12:35:22] Ignite node started OK (id=841d9bca)
>>> [12:35:22] Topology snapshot [ver=1, locNode=841d9bca, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=3.5GB]
>>> C*atched exception System.NullReferenceException*: Object reference not
>>> set to an instance of an object.
>>>   at segfault.Program.Main(String[] args) in
>>> /home/eduard/Development/X-files/segfault-2/Program.cs:line 23
>>>
>>>
>>> Our plan is to change our application to use the NET Core 3.0 and the
>>> thick client. We know that it is currently in RC but it's expected to be
>>> released on 23th of September and as we will be performing in depth tests
>>> to see if there is anything that breaks we expect that the 3.0 will be
>>> release by the time we decide to deploy it in production.
>>>
>>> So, first of all, I wanted to let you know about this issue in case any
>>> body gets in the same situation.
>>>
>>> And finally, do you guys foresee any problem with the migration?
>>>
>>>

Reply via email to