[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117800#comment-16117800 ] ASF subversion and git services commented on ARTEMIS-1324: -- Commit 88f78d97ef547c5b0bb231ceb0389f7d38a9cec8 in activemq-artemis's branch refs/heads/master from Clebert Suconic [ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=88f78d9 ] ARTEMIS-1324 Fix test > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117480#comment-16117480 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user asfgit closed the pull request at: https://github.com/apache/activemq-artemis/pull/1443 > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117478#comment-16117478 ] ASF subversion and git services commented on ARTEMIS-1324: -- Commit f16af7535417e3ba3ad625d22430edda7b76b5d2 in activemq-artemis's branch refs/heads/master from Clebert Suconic [ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=f16af75 ] ARTEMIS-1324 Deadlock detection and health check of critical components > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117396#comment-16117396 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131774529 --- Diff: docs/user-manual/en/configuration-index.md --- @@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing configuration settings using Bean [network-check-list](network-isolation.md) | The list of pings to be used on ping or InetAddress.isReacheable [network-check-ping-command](network-isolation.md) | The command used to oping IPV4 addresses [network-check-ping6-command](network-isolation.md) | The command used to oping IPV6 addresses +[critical-analyzer](critical-analysis.md) | Enable or disable the critical analysis (default true) --- End diff -- @michaelandrepearce it's just I didn't understand. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117397#comment-16117397 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131774705 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose, +
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117391#comment-16117391 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131774052 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose,
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117387#comment-16117387 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773734 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + --- End diff -- It would need to be a different feature I think. Monitoring has been on my radar since @franz1981 raised that PR on the CLI. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117385#comment-16117385 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773637 --- Diff: docs/user-manual/en/configuration-index.md --- @@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing configuration settings using Bean [network-check-list](network-isolation.md) | The list of pings to be used on ping or InetAddress.isReacheable [network-check-ping-command](network-isolation.md) | The command used to oping IPV4 addresses [network-check-ping6-command](network-isolation.md) | The command used to oping IPV6 addresses +[critical-analyzer](critical-analysis.md) | Enable or disable the critical analysis (default true) --- End diff -- agreed this is just a nit, something that can be done later > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117386#comment-16117386 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773683 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + --- End diff -- something for later btw, just a thought. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117381#comment-16117381 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773379 --- Diff: docs/user-manual/en/configuration-index.md --- @@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing configuration settings using Bean [network-check-list](network-isolation.md) | The list of pings to be used on ping or InetAddress.isReacheable [network-check-ping-command](network-isolation.md) | The command used to oping IPV4 addresses [network-check-ping6-command](network-isolation.md) | The command used to oping IPV6 addresses +[critical-analyzer](critical-analysis.md) | Enable or disable the critical analysis (default true) --- End diff -- I'm not sure what you mean here. I'm adding some doc changes. Can you change it after merged? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117365#comment-16117365 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131771529 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java --- @@ -479,12 +499,72 @@ public final synchronized void start() throws Exception { } } + @Override + public CriticalAnalyzer getCriticalAnalyzer() { + return this.analyzer; + } + private void internalStart() throws Exception { if (state != SERVER_STATE.STOPPED) { logger.debug("Server already started!"); return; } + /** Calling this for cases where the server was stopped and now is being restarted... failback, etc...*/ + this.analyzer.clear(); + + this.getCriticalAnalyzer().setCheckTime(configuration.getCriticalAnalyzerCheckPeriod()).setTimeout(configuration.getCriticalAnalyzerTimeout()); + + if (configuration.isCriticalAnalyzer()) { + this.getCriticalAnalyzer().start(); + } + + this.getCriticalAnalyzer().addAction((CriticalComponent c) -> { + ActiveMQServerLogger.LOGGER.irresponsiveComponent(c); + threadDump(); + + // on the case of a critical failure, -1 cannot simply means forever. + // in case graceful is -1, we will set it to 30 seconds + long timeout = configuration.getGracefulShutdownTimeout() < 0 ? 3 : configuration.getGracefulShutdownTimeout(); + + Thread notificationSender = new Thread() { +public void run () { --- End diff -- should this need an Overrides? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117360#comment-16117360 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131771146 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -2064,6 +2072,53 @@ public Configuration setNetworkCheckPing6Command(String command) { return this; } + @Override + public boolean isCriticalAnalyzer() { + return criticalAnalyzer; + } + + @Override + public Configuration setCriticalAnalyzer(boolean CriticalAnalyzer) { + this.criticalAnalyzer = CriticalAnalyzer; + return this; + } + + @Override + public long getCriticalAnalyzerTimeout() { + return criticalAnalyzerTimeout; + } + + @Override + public Configuration setCriticalAnalyzerTimeout(long timeout) { + this.criticalAnalyzerTimeout = timeout; + return this; + } + + @Override + public long getCriticalAnalyzerCheckPeriod() { + if (criticalAnalyzerCheckPeriod <= 0) { --- End diff -- agreed what is there is fine, these are only code nits > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117358#comment-16117358 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131770829 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -294,6 +294,14 @@ private String internalNamingPrefix = ActiveMQDefaultConfiguration.getInternalNamingPrefix(); + private boolean criticalAnalyzer = ActiveMQDefaultConfiguration.getCriticalAnalyzer(); + + private boolean criticalAnalyzerHalt = ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt(); + + private long criticalAnalyzerTimeout = ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout(); + + private long criticalAnalyzerCheckPeriod = 0; // non set --- End diff -- i was meaning make a static contstant ANALYZE_CHECK_PERIOD_NOT_SET=0 and use that, it make it less of a magic number. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117353#comment-16117353 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131769789 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + --- End diff -- I was just thinking the averages and percentiles of this, would make great things to be monitoring when healthy. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117352#comment-16117352 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131769469 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose, +
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117348#comment-16117348 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131769055 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -2064,6 +2072,53 @@ public Configuration setNetworkCheckPing6Command(String command) { return this; } + @Override + public boolean isCriticalAnalyzer() { + return criticalAnalyzer; + } + + @Override + public Configuration setCriticalAnalyzer(boolean CriticalAnalyzer) { + this.criticalAnalyzer = CriticalAnalyzer; + return this; + } + + @Override + public long getCriticalAnalyzerTimeout() { + return criticalAnalyzerTimeout; + } + + @Override + public Configuration setCriticalAnalyzerTimeout(long timeout) { + this.criticalAnalyzerTimeout = timeout; + return this; + } + + @Override + public long getCriticalAnalyzerCheckPeriod() { + if (criticalAnalyzerCheckPeriod <= 0) { --- End diff -- I can always merge what I have now.. and make changes later :) > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117343#comment-16117343 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131768837 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -294,6 +294,14 @@ private String internalNamingPrefix = ActiveMQDefaultConfiguration.getInternalNamingPrefix(); + private boolean criticalAnalyzer = ActiveMQDefaultConfiguration.getCriticalAnalyzer(); + + private boolean criticalAnalyzerHalt = ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt(); + + private long criticalAnalyzerTimeout = ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout(); + + private long criticalAnalyzerCheckPeriod = 0; // non set --- End diff -- I will just set 0.. meaning not set. I don't want to make this an Object just for this. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117335#comment-16117335 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131768028 --- Diff: artemis-core-client/src/main/java/org/apache/activemq/artemis/api/config/ActiveMQDefaultConfiguration.java --- @@ -477,6 +477,16 @@ public static String getDefaultHapolicyBackupStrategy() { public static int DEFAULT_QUORUM_SIZE = -1; + public static final boolean DEFAULT_ANALYZE_CRITICAL = true; + + public static final long DEFAULT_ANALYZE_CRITICAL_TIMEOUT = 12; + + // this will be 0, the implementation should return 1/2 of the configured critical timeout + public static final long DEFAULT_ANALYZE_CHECK_PERIOD = 0; --- End diff -- @clebertsuconic i put some other comments, maybe keep it and not have magic numbers in the other code, I've pointed out. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117333#comment-16117333 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on the issue: https://github.com/apache/activemq-artemis/pull/1443 @clebertsuconic I've left more code nits, and user feedback around the docs. Not really able to put the code through its paces, no doubt won't be able to do that till it hits a real environment, it seems reasonable the pseudo logic you describe. Also the test cases added look like they cover some good scenarios. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117327#comment-16117327 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766984 --- Diff: docs/user-manual/en/configuration-index.md --- @@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing configuration settings using Bean [network-check-list](network-isolation.md) | The list of pings to be used on ping or InetAddress.isReacheable [network-check-ping-command](network-isolation.md) | The command used to oping IPV4 addresses [network-check-ping6-command](network-isolation.md) | The command used to oping IPV6 addresses +[critical-analyzer](critical-analysis.md) | Enable or disable the critical analysis (default true) --- End diff -- just a note if we change descriptions in critical-analysis based on review comments, probably should make these match. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117326#comment-16117326 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766851 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +critical-analyzer | Enable or disable the critical analysis (default true) +critical-analyzer-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +critical-analyzer-check-period | Time used to check the response times (default half of critical-analyzer-timeout) +critical-analyzer-halt | Should the VM be halted upon failures (default false) --- End diff -- describe behaviour if true? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117324#comment-16117324 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766464 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +critical-analyzer | Enable or disable the critical analysis (default true) --- End diff -- maybe be explicit in what value enable and disable should take - as in: Enable (true) or disable (false) > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117320#comment-16117320 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766104 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -2064,6 +2072,53 @@ public Configuration setNetworkCheckPing6Command(String command) { return this; } + @Override + public boolean isCriticalAnalyzer() { + return criticalAnalyzer; + } + + @Override + public Configuration setCriticalAnalyzer(boolean CriticalAnalyzer) { + this.criticalAnalyzer = CriticalAnalyzer; + return this; + } + + @Override + public long getCriticalAnalyzerTimeout() { + return criticalAnalyzerTimeout; + } + + @Override + public Configuration setCriticalAnalyzerTimeout(long timeout) { + this.criticalAnalyzerTimeout = timeout; + return this; + } + + @Override + public long getCriticalAnalyzerCheckPeriod() { + if (criticalAnalyzerCheckPeriod <= 0) { --- End diff -- ditto i assume 0 should be constant DEFAULT_ANALYZE_CHECK_PERIOD > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117317#comment-16117317 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765975 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -294,6 +294,14 @@ private String internalNamingPrefix = ActiveMQDefaultConfiguration.getInternalNamingPrefix(); + private boolean criticalAnalyzer = ActiveMQDefaultConfiguration.getCriticalAnalyzer(); + + private boolean criticalAnalyzerHalt = ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt(); + + private long criticalAnalyzerTimeout = ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout(); + + private long criticalAnalyzerCheckPeriod = 0; // non set --- End diff -- I guess 0 should be a reference to constant DEFAULT_ANALYZE_CHECK_PERIOD i called out earlier as having no reference. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117313#comment-16117313 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765567 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -294,6 +294,14 @@ private String internalNamingPrefix = ActiveMQDefaultConfiguration.getInternalNamingPrefix(); + private boolean criticalAnalyzer = ActiveMQDefaultConfiguration.getCriticalAnalyzer(); + + private boolean criticalAnalyzerHalt = ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt(); + + private long criticalAnalyzerTimeout = ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout(); + + private long criticalAnalyzerCheckPeriod = 0; // non set --- End diff -- ignore, just worked out whats going on here. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117309#comment-16117309 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765286 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -294,6 +294,14 @@ private String internalNamingPrefix = ActiveMQDefaultConfiguration.getInternalNamingPrefix(); + private boolean criticalAnalyzer = ActiveMQDefaultConfiguration.getCriticalAnalyzer(); + + private boolean criticalAnalyzerHalt = ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt(); + + private long criticalAnalyzerTimeout = ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout(); + + private long criticalAnalyzerCheckPeriod = 0; // non set --- End diff -- @michaelandrepearce nope.. because the value will only be determined during runtime (when getCriticalAnalyzerCheckPeriod is called. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117310#comment-16117310 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765381 --- Diff: artemis-core-client/src/main/java/org/apache/activemq/artemis/api/config/ActiveMQDefaultConfiguration.java --- @@ -477,6 +477,16 @@ public static String getDefaultHapolicyBackupStrategy() { public static int DEFAULT_QUORUM_SIZE = -1; + public static final boolean DEFAULT_ANALYZE_CRITICAL = true; + + public static final long DEFAULT_ANALYZE_CRITICAL_TIMEOUT = 12; + + // this will be 0, the implementation should return 1/2 of the configured critical timeout + public static final long DEFAULT_ANALYZE_CHECK_PERIOD = 0; --- End diff -- nope.. will remove it > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117306#comment-16117306 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765096 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -294,6 +294,14 @@ private String internalNamingPrefix = ActiveMQDefaultConfiguration.getInternalNamingPrefix(); + private boolean criticalAnalyzer = ActiveMQDefaultConfiguration.getCriticalAnalyzer(); + + private boolean criticalAnalyzerHalt = ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt(); + + private long criticalAnalyzerTimeout = ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout(); + + private long criticalAnalyzerCheckPeriod = 0; // non set --- End diff -- should this not be criticalAnalyzerCheckPeriod = ActiveMQDefaultConfiguration.getCriticalAnalyzerCheckPeriod(criticalAnalyzerTimeout) > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117300#comment-16117300 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131764621 --- Diff: artemis-core-client/src/main/java/org/apache/activemq/artemis/api/config/ActiveMQDefaultConfiguration.java --- @@ -477,6 +477,16 @@ public static String getDefaultHapolicyBackupStrategy() { public static int DEFAULT_QUORUM_SIZE = -1; + public static final boolean DEFAULT_ANALYZE_CRITICAL = true; + + public static final long DEFAULT_ANALYZE_CRITICAL_TIMEOUT = 12; + + // this will be 0, the implementation should return 1/2 of the configured critical timeout + public static final long DEFAULT_ANALYZE_CHECK_PERIOD = 0; --- End diff -- with the logic below now of timeout / 2, is this used? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117295#comment-16117295 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131764161 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalMeasure.java --- @@ -0,0 +1,52 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import org.jboss.logging.Logger; + +public class CriticalMeasure { + + private static final Logger logger = Logger.getLogger(CriticalMeasure.class); + + private volatile long timeEnter; + private volatile long timeLeft; + + public void enterCritical() { + timeEnter = System.currentTimeMillis(); + } + + public void leaveCritical() { + timeLeft = System.currentTimeMillis(); + } + + public boolean isExpired(long timeout) { + if (timeEnter > timeLeft) { + return System.currentTimeMillis() - timeEnter > timeout; + } + --- End diff -- nit: empty line. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117281#comment-16117281 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131763449 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose,
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117268#comment-16117268 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131762773 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalComponentImpl.java --- @@ -0,0 +1,68 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.activemq.artemis.utils.critical; + +/** + * This is not abstract as it could be used through aggregations or extensions. + * This is only good for cases where you call leave within the same thread as you called enter. + */ +public class CriticalComponentImpl implements CriticalComponent { + + private final CriticalMeasure[] measures; + private final CriticalAnalyzer analyzer; + + public CriticalComponentImpl(CriticalAnalyzer analyzer, int numberOfPaths) { + if (analyzer == null) { + analyzer = EmptyCriticalAnalyzer.getInstance(); + } + this.analyzer = analyzer; + + if (analyzer.isMeasuring()) { + measures = new CriticalMeasure[numberOfPaths]; + for (int i = 0; i < numberOfPaths; i++) { +measures[i] = new CriticalMeasure(); + } + } else { + measures = null; + } + } + + @Override + public void enterCritical(int path) { + if (analyzer.isMeasuring()) { + measures[path].enterCritical(); + } + --- End diff -- oops.. thx > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117256#comment-16117256 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131762073 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalComponentImpl.java --- @@ -0,0 +1,68 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.activemq.artemis.utils.critical; + +/** + * This is not abstract as it could be used through aggregations or extensions. + * This is only good for cases where you call leave within the same thread as you called enter. + */ +public class CriticalComponentImpl implements CriticalComponent { + + private final CriticalMeasure[] measures; + private final CriticalAnalyzer analyzer; + + public CriticalComponentImpl(CriticalAnalyzer analyzer, int numberOfPaths) { + if (analyzer == null) { + analyzer = EmptyCriticalAnalyzer.getInstance(); + } + this.analyzer = analyzer; + + if (analyzer.isMeasuring()) { + measures = new CriticalMeasure[numberOfPaths]; + for (int i = 0; i < numberOfPaths; i++) { +measures[i] = new CriticalMeasure(); + } + } else { + measures = null; + } + } + + @Override + public void enterCritical(int path) { + if (analyzer.isMeasuring()) { + measures[path].enterCritical(); + } + --- End diff -- nit: empty line > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116873#comment-16116873 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131711379 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose, +
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115941#comment-16115941 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131552826 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +analyze-critical | Enable or disable the critical analysis (default true) +analyze-critical-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +analyze-critical-check-period | Time used to check the response times (default half of analyze-critical-timeout) +analyze-critical-halt | Should the VM be halted upon failures (default false) --- End diff -- i would imagine a health monitor/critical analysers threads are independent of core flow, as such it should detect and action if a component isn't responding or non healthy state. As such it should be able to declare that state back. e.g. if JVM bad state due to deadlock, that deadlock should only be affecting the threads/flow where the deadlock is , as such this component should still be servicing. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115940#comment-16115940 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131552744 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose,
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115796#comment-16115796 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131541817 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +analyze-critical | Enable or disable the critical analysis (default true) +analyze-critical-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +analyze-critical-check-period | Time used to check the response times (default half of analyze-critical-timeout) +analyze-critical-halt | Should the VM be halted upon failures (default false) --- End diff -- I can use management. I wasn't sure it would be useful. As if the JVM is in bad state would the message arrive ? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115797#comment-16115797 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on the issue: https://github.com/apache/activemq-artemis/pull/1443 I will make some changes here tomorrow. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115794#comment-16115794 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131541803 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose, +
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115631#comment-16115631 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534319 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -2064,6 +2072,53 @@ public Configuration setNetworkCheckPing6Command(String command) { return this; } + @Override + public boolean isAnalyzeCritical() { + return analyzeCritical; + } + + @Override + public Configuration setAnalyzeCritical(boolean analyzeCritical) { + this.analyzeCritical = analyzeCritical; + return this; + } + + @Override + public long getAnalyzeCriticalTimeout() { + return analyzeCriticalTimeout; + } + + @Override + public Configuration setAnalyzeCriticalTimeout(long timeout) { + this.analyzeCriticalTimeout = timeout; + return this; + } + + @Override + public long getAnalyzeCriticalCheckPeriod() { + if (analyzeCriticalCheckPeriod == ActiveMQDefaultConfiguration.getAnalyzeCriticalCheckPeriod()) { --- End diff -- I was more getting at should the defaulting logic of using the timeout if the value is not set live in the config parser? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115630#comment-16115630 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534309 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; --- End diff -- Is this one of these values not being updated by one thread and read by another then? I may have miss understood the code. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115629#comment-16115629 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534303 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +analyze-critical | Enable or disable the critical analysis (default true) +analyze-critical-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +analyze-critical-check-period | Time used to check the response times (default half of analyze-critical-timeout) +analyze-critical-halt | Should the VM be halted upon failures (default false) --- End diff -- I was more saying: how does an outside observer view the state. Eg many people rely on logs or JMX > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115627#comment-16115627 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534288 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose,
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115579#comment-16115579 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532700 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +analyze-critical | Enable or disable the critical analysis (default true) +analyze-critical-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +analyze-critical-check-period | Time used to check the response times (default half of analyze-critical-timeout) +analyze-critical-halt | Should the VM be halted upon failures (default false) --- End diff -- The idea here was to add monitoring on internal objects similar to what we do on pings / pongs through the protocols. I didn't want to make it any more complex than needed. This subject could be raised to become an independent project if you keep investing here. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115578#comment-16115578 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532685 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java --- @@ -477,12 +484,44 @@ public final synchronized void start() throws Exception { } } + @Override + public CriticalAnalyzer getCriticalAnalyzer() { + return this.analyzer; + } + private void internalStart() throws Exception { if (state != SERVER_STATE.STOPPED) { logger.debug("Server already started!"); return; } + if (!configuration.isAnalyzeCritical()) { --- End diff -- I thought the configuration was being parsed inside ActiveMQServerImpl. (Bad memory, I forget about stuff months after working on stuff :) )... I will move it to constructor. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115577#comment-16115577 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532658 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; --- End diff -- I don't need it to be atomic really. as soon as the dead lock stops updating this.. it would be enough to me to just have a volatile here. The only overhead is the call to System.currentTimeMillis(); I did some tests and didn't see any regressions. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115576#comment-16115576 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532629 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose, +
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115575#comment-16115575 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532619 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/deployers/impl/FileConfigurationParser.java --- @@ -608,6 +608,14 @@ public void parseMainConfig(final Element e, final Configuration config) throws config.setNetworkCheckPingCommand(getString(e, "network-check-ping-command", config.getNetworkCheckPingCommand(), Validators.NO_CHECK)); + config.setAnalyzeCritical(getBoolean(e, "analyze-critical", config.isAnalyzeCritical())); + + config.setAnalyzeCriticalTimeout(getLong(e, "analyze-critical-timeout", config.getAnalyzeCriticalTimeout(), Validators.GE_ZERO)); + + config.setAnalyzeCriticalCheckPeriod(getLong(e, "analyze-critical-check-period", config.getAnalyzeCriticalCheckPeriod(), Validators.GE_ZERO)); --- End diff -- I didn't want to use null here. On this case, check period=0 would mean not set. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115573#comment-16115573 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532611 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -2064,6 +2072,53 @@ public Configuration setNetworkCheckPing6Command(String command) { return this; } + @Override + public boolean isAnalyzeCritical() { + return analyzeCritical; + } + + @Override + public Configuration setAnalyzeCritical(boolean analyzeCritical) { + this.analyzeCritical = analyzeCritical; + return this; + } + + @Override + public long getAnalyzeCriticalTimeout() { + return analyzeCriticalTimeout; + } + + @Override + public Configuration setAnalyzeCriticalTimeout(long timeout) { + this.analyzeCriticalTimeout = timeout; + return this; + } + + @Override + public long getAnalyzeCriticalCheckPeriod() { + if (analyzeCriticalCheckPeriod == ActiveMQDefaultConfiguration.getAnalyzeCriticalCheckPeriod()) { --- End diff -- if 0, then the we would calculate the check-period. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115569#comment-16115569 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532564 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +analyze-critical | Enable or disable the critical analysis (default true) +analyze-critical-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +analyze-critical-check-period | Time used to check the response times (default half of analyze-critical-timeout) +analyze-critical-halt | Should the VM be halted upon failures (default false) --- End diff -- Shutdown. Server.stop. But there is no guarantee it would work on deadlocks. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115531#comment-16115531 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530954 --- Diff: docs/user-manual/en/critical-analysis.md --- @@ -0,0 +1,32 @@ +# Critical Analysis of the broker + +There are a few things that can go wrong on a production environment: + +- Bugs, for more than we try they still happen! We always try to correct them, but that's the only constant in software development. +- IO Errors, disks and hardware can go bad +- Memory issues, the CPU can go crazy by another process + +For cases like this, we added a protection to the broker to shut itself down when bad things happen. + +We measure time response in places like: + +- Queue delivery (add to the queue) +- Journal storage +- Paging operations + +If the response time goes beyond a configured timeout, the broker is considered unstable and an action will be taken to either shutdown the broker or halt the VM. + +You can use these following configuration options on broker.xml to configure how the critical analysis is performed. + + +Name | Description +:--- | :--- +analyze-critical | Enable or disable the critical analysis (default true) +analyze-critical-timeout | Timeout used to do the critical analysis (default 12 milliseconds) +analyze-critical-check-period | Time used to check the response times (default half of analyze-critical-timeout) +analyze-critical-halt | Should the VM be halted upon failures (default false) --- End diff -- what occurs if you have this to no halt? does it log out, also is it available as a jmx endpoint so users monitoring systems using the exposed jmx endpoints, could be configured to check and alert also to their operational support teams. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115528#comment-16115528 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530849 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java --- @@ -477,12 +484,44 @@ public final synchronized void start() throws Exception { } } + @Override + public CriticalAnalyzer getCriticalAnalyzer() { + return this.analyzer; + } + private void internalStart() throws Exception { if (state != SERVER_STATE.STOPPED) { logger.debug("Server already started!"); return; } + if (!configuration.isAnalyzeCritical()) { --- End diff -- why not do this within object construction? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115532#comment-16115532 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530890 --- Diff: artemis-server/src/test/resources/ConfigurationTest-full-config.xml --- @@ -57,6 +57,10 @@ 1234567 37 123 + true --- End diff -- between documentation you name this Critical Analysis, then in config etc you switch round the words to analyse critical. suggest keeping naming constant. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115529#comment-16115529 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530870 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java --- @@ -477,12 +484,44 @@ public final synchronized void start() throws Exception { } } + @Override + public CriticalAnalyzer getCriticalAnalyzer() { + return this.analyzer; + } + private void internalStart() throws Exception { if (state != SERVER_STATE.STOPPED) { logger.debug("Server already started!"); return; } + if (!configuration.isAnalyzeCritical()) { + this.analyzer = new EmptyCriticalAnalyzer(); + } + + /** Calling this for cases where the server was stopped and now is being restarted... failback, etc...*/ + this.analyzer.clear(); + + this.getCriticalAnalyzer().setCheckTime(configuration.getAnalyzeCriticalTimeout()).setTimeout(configuration.getAnalyzeCriticalTimeout()); --- End diff -- again why not do this in the constructor? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115530#comment-16115530 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530746 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; --- End diff -- if this is on hot path, why not use an atomic or have use atomicfieldupdater (same with check time) http://normanmaurer.me/blog/2013/10/28/Lesser-known-concurrent-classes-Part-1/ > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115525#comment-16115525 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on the issue: https://github.com/apache/activemq-artemis/pull/1443 What is the over head of this, latency and throughput? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115522#comment-16115522 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530423 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/deployers/impl/FileConfigurationParser.java --- @@ -608,6 +608,14 @@ public void parseMainConfig(final Element e, final Configuration config) throws config.setNetworkCheckPingCommand(getString(e, "network-check-ping-command", config.getNetworkCheckPingCommand(), Validators.NO_CHECK)); + config.setAnalyzeCritical(getBoolean(e, "analyze-critical", config.isAnalyzeCritical())); + + config.setAnalyzeCriticalTimeout(getLong(e, "analyze-critical-timeout", config.getAnalyzeCriticalTimeout(), Validators.GE_ZERO)); + + config.setAnalyzeCriticalCheckPeriod(getLong(e, "analyze-critical-check-period", config.getAnalyzeCriticalCheckPeriod(), Validators.GE_ZERO)); --- End diff -- as per other review comment should the default not be half the timeout if it is not set, not some arbitarty number which then forces the section of code later. > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115521#comment-16115521 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530496 --- Diff: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.utils.critical; + +import java.util.ConcurrentModificationException; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeUnit; + +import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet; +import org.jboss.logging.Logger; + +public class CriticalAnalyzerImpl implements CriticalAnalyzer { + + private final Logger logger = Logger.getLogger(CriticalAnalyzer.class); + + private volatile long timeout; + + private volatile long checkTime; + + @Override + public void clear() { + actions.clear(); + components.clear(); + } + + private CopyOnWriteArrayList actions = new CopyOnWriteArrayList<>(); + + private Thread thread; + + private final Semaphore running = new Semaphore(1); + + private final ConcurrentHashSet components = new ConcurrentHashSet<>(); + + @Override + public void add(CriticalComponent component) { + components.add(component); + } + + @Override + public void remove(CriticalComponent component) { + components.remove(component); + } + + @Override + public CriticalAnalyzer setCheckTime(long timeout) { + this.checkTime = timeout; + return this; + } + + @Override + public long getCheckTime() { + if (checkTime == 0) { + checkTime = getTimeout() / 2; + } + return checkTime; + } + + @Override + public CriticalAnalyzer setTimeout(long timeout) { + this.timeout = timeout; + return this; + } + + @Override + public long getTimeout() { + if (timeout == 0) { + timeout = TimeUnit.MINUTES.toMillis(2); + } + return timeout; + } + + @Override + public CriticalAnalyzer addAction(CriticalAction action) { + this.actions.add(action); + return this; + } + + @Override + public void check() { + boolean retry = true; + while (retry) { + try { +for (CriticalComponent component : components) { + + int pathReturned = component.isExpired(timeout); + if (pathReturned >= 0) { + fireAction(component, pathReturned); + // no need to keep running if there's already a component failed + return; + } +} +retry = false; // got to the end of the list, no need to retry + } catch (ConcurrentModificationException dontCare) { +// lets retry on the loop + } + } + } + + private void fireAction(CriticalComponent component, int path) { + for (CriticalAction action: actions) { + try { +action.run(component, path); + } catch (Throwable e) { +logger.warn(e.getMessage(), e); + } + } + } + + @Override + public void start() { + + if (!running.tryAcquire()) { + // already running + return; + } + + // we are not using any Thread Pool or any Scheduled Executors from the ArtemisServer + // as that would defeat the purpose,
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115523#comment-16115523 ] ASF GitHub Bot commented on ARTEMIS-1324: - Github user michaelandrepearce commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530398 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java --- @@ -2064,6 +2072,53 @@ public Configuration setNetworkCheckPing6Command(String command) { return this; } + @Override + public boolean isAnalyzeCritical() { + return analyzeCritical; + } + + @Override + public Configuration setAnalyzeCritical(boolean analyzeCritical) { + this.analyzeCritical = analyzeCritical; + return this; + } + + @Override + public long getAnalyzeCriticalTimeout() { + return analyzeCriticalTimeout; + } + + @Override + public Configuration setAnalyzeCriticalTimeout(long timeout) { + this.analyzeCriticalTimeout = timeout; + return this; + } + + @Override + public long getAnalyzeCriticalCheckPeriod() { + if (analyzeCriticalCheckPeriod == ActiveMQDefaultConfiguration.getAnalyzeCriticalCheckPeriod()) { --- End diff -- how would you differentiate between it being the default and someone actually have set it, should this defaulting not occur in config/broker.xml reading where if null then analyzeCriticalCheckPeriod is defaulted to half timeout? > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker
[ https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115278#comment-16115278 ] ASF GitHub Bot commented on ARTEMIS-1324: - GitHub user clebertsuconic opened a pull request: https://github.com/apache/activemq-artemis/pull/1443 ARTEMIS-1324 Deadlock detection and health check of critical components You can merge this pull request into a Git repository by running: $ git pull https://github.com/clebertsuconic/activemq-artemis critical-prototype Alternatively you can review and apply these changes as the patch at: https://github.com/apache/activemq-artemis/pull/1443.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1443 commit 58474c42b74d31e4727d4da58553c01533d4f4e6 Author: Clebert Suconic Date: 2017-08-05T04:52:42Z ARTEMIS-1324 Deadlock detection and health check of critical components > Critical Analysis and deadlock detection on broker > -- > > Key: ARTEMIS-1324 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1324 > Project: ActiveMQ Artemis > Issue Type: New Feature > Components: Broker >Reporter: clebert suconic >Assignee: clebert suconic >Priority: Critical > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)