[ 
https://issues.apache.org/jira/browse/IGNITE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kasnacheev updated IGNITE-7476:
------------------------------------
    Description: 
Sometimes server node will fail with the following trace:
{code:java}
SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping the 
node in order to prevent cluster wide instability.
java.lang.NullPointerException
    at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1149)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5022)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2690)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2491)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6675)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2574)
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62){code}
Two problems here:
 * Uncaught exception in cacheMetrics() leads to unconditional failure of node, 
because it happens to be in discovery thread. Should probably wrap all 
non-trivial code include try-catch.
 * Lack of proper locking when destroying cache (see also IGNITE-6580, 
IGNITE-7278 and IGNITE-7165)

 

  was:
Sometimes server node will fail with the following trace:
{code:java}
SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping the 
node in order to prevent cluster wide instability.
java.lang.NullPointerException
    at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1149)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5022)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2690)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2491)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6675)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2574)
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62){code}
Two problems here:
 * Uncaught exception in cacheMetrics() leads to unconditional failure of node, 
because it happens to be in discovery thread. Should probably wrap all 
non-trivial code include try-catch.
 * Lack of proper locking when destroying cache (see also IGNITE-6423 and 
IGNITE-7165)

 


> Server node will join with failure gathering metrics
> ----------------------------------------------------
>
>                 Key: IGNITE-7476
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7476
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ilya Kasnacheev
>            Priority: Critical
>
> Sometimes server node will fail with the following trace:
> {code:java}
> SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping 
> the node in order to prevent cluster wide instability.
> java.lang.NullPointerException
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1149)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5022)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2690)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2491)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6675)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2574)
>     at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62){code}
> Two problems here:
>  * Uncaught exception in cacheMetrics() leads to unconditional failure of 
> node, because it happens to be in discovery thread. Should probably wrap all 
> non-trivial code include try-catch.
>  * Lack of proper locking when destroying cache (see also IGNITE-6580, 
> IGNITE-7278 and IGNITE-7165)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to