[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117800#comment-16117800
 ] 

ASF subversion and git services commented on ARTEMIS-1324:
--

Commit 88f78d97ef547c5b0bb231ceb0389f7d38a9cec8 in activemq-artemis's branch 
refs/heads/master from Clebert Suconic
[ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=88f78d9 ]

ARTEMIS-1324 Fix test


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117480#comment-16117480
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user asfgit closed the pull request at:

https://github.com/apache/activemq-artemis/pull/1443


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117478#comment-16117478
 ] 

ASF subversion and git services commented on ARTEMIS-1324:
--

Commit f16af7535417e3ba3ad625d22430edda7b76b5d2 in activemq-artemis's branch 
refs/heads/master from Clebert Suconic
[ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=f16af75 ]

ARTEMIS-1324 Deadlock detection and health check of critical components


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117396#comment-16117396
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131774529
  
--- Diff: docs/user-manual/en/configuration-index.md ---
@@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing 
configuration settings using Bean
 [network-check-list](network-isolation.md) | The list of pings to be used 
on ping or InetAddress.isReacheable
 [network-check-ping-command](network-isolation.md) | The command used to 
oping IPV4 addresses
 [network-check-ping6-command](network-isolation.md) | The command used to 
oping IPV6 addresses
+[critical-analyzer](critical-analysis.md) | Enable or disable the critical 
analysis (default true)
--- End diff --

@michaelandrepearce it's just I didn't understand.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117397#comment-16117397
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131774705
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as in 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117387#comment-16117387
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773734
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
--- End diff --

It would need to be a different feature I think. 

Monitoring has been on my radar since @franz1981 raised that PR on the CLI.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117385#comment-16117385
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773637
  
--- Diff: docs/user-manual/en/configuration-index.md ---
@@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing 
configuration settings using Bean
 [network-check-list](network-isolation.md) | The list of pings to be used 
on ping or InetAddress.isReacheable
 [network-check-ping-command](network-isolation.md) | The command used to 
oping IPV4 addresses
 [network-check-ping6-command](network-isolation.md) | The command used to 
oping IPV6 addresses
+[critical-analyzer](critical-analysis.md) | Enable or disable the critical 
analysis (default true)
--- End diff --

agreed this is just a nit, something that can be done later


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117386#comment-16117386
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773683
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
--- End diff --

something for later btw, just a thought.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117381#comment-16117381
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131773379
  
--- Diff: docs/user-manual/en/configuration-index.md ---
@@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing 
configuration settings using Bean
 [network-check-list](network-isolation.md) | The list of pings to be used 
on ping or InetAddress.isReacheable
 [network-check-ping-command](network-isolation.md) | The command used to 
oping IPV4 addresses
 [network-check-ping6-command](network-isolation.md) | The command used to 
oping IPV6 addresses
+[critical-analyzer](critical-analysis.md) | Enable or disable the critical 
analysis (default true)
--- End diff --

I'm not sure what you mean here. I'm adding some doc changes. Can you 
change it after merged?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117365#comment-16117365
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131771529
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
 ---
@@ -479,12 +499,72 @@ public final synchronized void start() throws 
Exception {
   }
}
 
+   @Override
+   public CriticalAnalyzer getCriticalAnalyzer() {
+  return this.analyzer;
+   }
+
private void internalStart() throws Exception {
   if (state != SERVER_STATE.STOPPED) {
  logger.debug("Server already started!");
  return;
   }
 
+  /** Calling this for cases where the server was stopped and now is 
being restarted... failback, etc...*/
+  this.analyzer.clear();
+
+  
this.getCriticalAnalyzer().setCheckTime(configuration.getCriticalAnalyzerCheckPeriod()).setTimeout(configuration.getCriticalAnalyzerTimeout());
+
+  if (configuration.isCriticalAnalyzer()) {
+ this.getCriticalAnalyzer().start();
+  }
+
+  this.getCriticalAnalyzer().addAction((CriticalComponent c) -> {
+ ActiveMQServerLogger.LOGGER.irresponsiveComponent(c);
+ threadDump();
+
+ // on the case of a critical failure, -1 cannot simply means 
forever.
+ // in case graceful is -1, we will set it to 30 seconds
+ long timeout = configuration.getGracefulShutdownTimeout() < 0 ? 
3 : configuration.getGracefulShutdownTimeout();
+
+ Thread notificationSender = new Thread() {
+public void run () {
--- End diff --

should this need an Overrides?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117360#comment-16117360
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131771146
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -2064,6 +2072,53 @@ public Configuration 
setNetworkCheckPing6Command(String command) {
   return this;
}
 
+   @Override
+   public boolean isCriticalAnalyzer() {
+  return criticalAnalyzer;
+   }
+
+   @Override
+   public Configuration setCriticalAnalyzer(boolean CriticalAnalyzer) {
+  this.criticalAnalyzer = CriticalAnalyzer;
+  return this;
+   }
+
+   @Override
+   public long getCriticalAnalyzerTimeout() {
+  return criticalAnalyzerTimeout;
+   }
+
+   @Override
+   public Configuration setCriticalAnalyzerTimeout(long timeout) {
+  this.criticalAnalyzerTimeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCriticalAnalyzerCheckPeriod() {
+  if (criticalAnalyzerCheckPeriod <= 0) {
--- End diff --

agreed what is there is fine, these are only code nits


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117358#comment-16117358
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131770829
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -294,6 +294,14 @@
 
private String internalNamingPrefix = 
ActiveMQDefaultConfiguration.getInternalNamingPrefix();
 
+   private boolean criticalAnalyzer = 
ActiveMQDefaultConfiguration.getCriticalAnalyzer();
+
+   private boolean criticalAnalyzerHalt = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt();
+
+   private long criticalAnalyzerTimeout = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout();
+
+   private long criticalAnalyzerCheckPeriod = 0; // non set
--- End diff --

i was meaning make a static contstant ANALYZE_CHECK_PERIOD_NOT_SET=0 and 
use that, it make it less of a magic number.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117353#comment-16117353
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131769789
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
--- End diff --

I was just thinking the averages and percentiles of this, would make great 
things to be monitoring when healthy.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117352#comment-16117352
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131769469
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as in 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117348#comment-16117348
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131769055
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -2064,6 +2072,53 @@ public Configuration 
setNetworkCheckPing6Command(String command) {
   return this;
}
 
+   @Override
+   public boolean isCriticalAnalyzer() {
+  return criticalAnalyzer;
+   }
+
+   @Override
+   public Configuration setCriticalAnalyzer(boolean CriticalAnalyzer) {
+  this.criticalAnalyzer = CriticalAnalyzer;
+  return this;
+   }
+
+   @Override
+   public long getCriticalAnalyzerTimeout() {
+  return criticalAnalyzerTimeout;
+   }
+
+   @Override
+   public Configuration setCriticalAnalyzerTimeout(long timeout) {
+  this.criticalAnalyzerTimeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCriticalAnalyzerCheckPeriod() {
+  if (criticalAnalyzerCheckPeriod <= 0) {
--- End diff --

I can always merge what I have now.. and make changes later :)


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117343#comment-16117343
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131768837
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -294,6 +294,14 @@
 
private String internalNamingPrefix = 
ActiveMQDefaultConfiguration.getInternalNamingPrefix();
 
+   private boolean criticalAnalyzer = 
ActiveMQDefaultConfiguration.getCriticalAnalyzer();
+
+   private boolean criticalAnalyzerHalt = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt();
+
+   private long criticalAnalyzerTimeout = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout();
+
+   private long criticalAnalyzerCheckPeriod = 0; // non set
--- End diff --

I will just set 0.. meaning not set. I don't want to make this an Object 
just for this.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117335#comment-16117335
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131768028
  
--- Diff: 
artemis-core-client/src/main/java/org/apache/activemq/artemis/api/config/ActiveMQDefaultConfiguration.java
 ---
@@ -477,6 +477,16 @@ public static String 
getDefaultHapolicyBackupStrategy() {
 
public static int DEFAULT_QUORUM_SIZE = -1;
 
+   public static final boolean DEFAULT_ANALYZE_CRITICAL = true;
+
+   public static final long DEFAULT_ANALYZE_CRITICAL_TIMEOUT = 12;
+
+   // this will be 0, the implementation should return 1/2 of the 
configured critical timeout
+   public static final long DEFAULT_ANALYZE_CHECK_PERIOD = 0;
--- End diff --

@clebertsuconic i put some other comments, maybe keep it and not have magic 
numbers in the other code, I've pointed out.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117333#comment-16117333
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/1443
  
@clebertsuconic I've left more code nits, and user feedback around the 
docs. 

Not really able to put the code through its paces, no doubt won't be able 
to do that till it hits a real environment, it seems reasonable the pseudo 
logic you describe. 

Also the test cases added look like they cover some good scenarios.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117327#comment-16117327
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766984
  
--- Diff: docs/user-manual/en/configuration-index.md ---
@@ -118,6 +118,11 @@ system-property-prefix | Prefix for replacing 
configuration settings using Bean
 [network-check-list](network-isolation.md) | The list of pings to be used 
on ping or InetAddress.isReacheable
 [network-check-ping-command](network-isolation.md) | The command used to 
oping IPV4 addresses
 [network-check-ping6-command](network-isolation.md) | The command used to 
oping IPV6 addresses
+[critical-analyzer](critical-analysis.md) | Enable or disable the critical 
analysis (default true)
--- End diff --

just a note if we change descriptions in critical-analysis based on review 
comments, probably should make these match. 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117326#comment-16117326
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766851
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+critical-analyzer | Enable or disable the critical analysis (default true)
+critical-analyzer-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+critical-analyzer-check-period | Time used to check the response times 
(default half of critical-analyzer-timeout)
+critical-analyzer-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

describe behaviour if true?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117324#comment-16117324
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766464
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+critical-analyzer | Enable or disable the critical analysis (default true)
--- End diff --

maybe be explicit in what value enable and disable should take - as in:

Enable (true) or disable (false) 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117320#comment-16117320
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131766104
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -2064,6 +2072,53 @@ public Configuration 
setNetworkCheckPing6Command(String command) {
   return this;
}
 
+   @Override
+   public boolean isCriticalAnalyzer() {
+  return criticalAnalyzer;
+   }
+
+   @Override
+   public Configuration setCriticalAnalyzer(boolean CriticalAnalyzer) {
+  this.criticalAnalyzer = CriticalAnalyzer;
+  return this;
+   }
+
+   @Override
+   public long getCriticalAnalyzerTimeout() {
+  return criticalAnalyzerTimeout;
+   }
+
+   @Override
+   public Configuration setCriticalAnalyzerTimeout(long timeout) {
+  this.criticalAnalyzerTimeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCriticalAnalyzerCheckPeriod() {
+  if (criticalAnalyzerCheckPeriod <= 0) {
--- End diff --

ditto i assume 0 should be constant DEFAULT_ANALYZE_CHECK_PERIOD


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117317#comment-16117317
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765975
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -294,6 +294,14 @@
 
private String internalNamingPrefix = 
ActiveMQDefaultConfiguration.getInternalNamingPrefix();
 
+   private boolean criticalAnalyzer = 
ActiveMQDefaultConfiguration.getCriticalAnalyzer();
+
+   private boolean criticalAnalyzerHalt = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt();
+
+   private long criticalAnalyzerTimeout = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout();
+
+   private long criticalAnalyzerCheckPeriod = 0; // non set
--- End diff --

I guess 0 should be a reference to constant DEFAULT_ANALYZE_CHECK_PERIOD i 
called out earlier as having no reference.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117313#comment-16117313
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765567
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -294,6 +294,14 @@
 
private String internalNamingPrefix = 
ActiveMQDefaultConfiguration.getInternalNamingPrefix();
 
+   private boolean criticalAnalyzer = 
ActiveMQDefaultConfiguration.getCriticalAnalyzer();
+
+   private boolean criticalAnalyzerHalt = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt();
+
+   private long criticalAnalyzerTimeout = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout();
+
+   private long criticalAnalyzerCheckPeriod = 0; // non set
--- End diff --

ignore, just worked out whats going on here. 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117309#comment-16117309
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765286
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -294,6 +294,14 @@
 
private String internalNamingPrefix = 
ActiveMQDefaultConfiguration.getInternalNamingPrefix();
 
+   private boolean criticalAnalyzer = 
ActiveMQDefaultConfiguration.getCriticalAnalyzer();
+
+   private boolean criticalAnalyzerHalt = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt();
+
+   private long criticalAnalyzerTimeout = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout();
+
+   private long criticalAnalyzerCheckPeriod = 0; // non set
--- End diff --

@michaelandrepearce nope.. because the value will only be determined during 
runtime (when getCriticalAnalyzerCheckPeriod is called.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117310#comment-16117310
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765381
  
--- Diff: 
artemis-core-client/src/main/java/org/apache/activemq/artemis/api/config/ActiveMQDefaultConfiguration.java
 ---
@@ -477,6 +477,16 @@ public static String 
getDefaultHapolicyBackupStrategy() {
 
public static int DEFAULT_QUORUM_SIZE = -1;
 
+   public static final boolean DEFAULT_ANALYZE_CRITICAL = true;
+
+   public static final long DEFAULT_ANALYZE_CRITICAL_TIMEOUT = 12;
+
+   // this will be 0, the implementation should return 1/2 of the 
configured critical timeout
+   public static final long DEFAULT_ANALYZE_CHECK_PERIOD = 0;
--- End diff --

nope.. will remove it


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117306#comment-16117306
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131765096
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -294,6 +294,14 @@
 
private String internalNamingPrefix = 
ActiveMQDefaultConfiguration.getInternalNamingPrefix();
 
+   private boolean criticalAnalyzer = 
ActiveMQDefaultConfiguration.getCriticalAnalyzer();
+
+   private boolean criticalAnalyzerHalt = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerHalt();
+
+   private long criticalAnalyzerTimeout = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerTimeout();
+
+   private long criticalAnalyzerCheckPeriod = 0; // non set
--- End diff --

should this not be criticalAnalyzerCheckPeriod = 
ActiveMQDefaultConfiguration.getCriticalAnalyzerCheckPeriod(criticalAnalyzerTimeout)


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117300#comment-16117300
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131764621
  
--- Diff: 
artemis-core-client/src/main/java/org/apache/activemq/artemis/api/config/ActiveMQDefaultConfiguration.java
 ---
@@ -477,6 +477,16 @@ public static String 
getDefaultHapolicyBackupStrategy() {
 
public static int DEFAULT_QUORUM_SIZE = -1;
 
+   public static final boolean DEFAULT_ANALYZE_CRITICAL = true;
+
+   public static final long DEFAULT_ANALYZE_CRITICAL_TIMEOUT = 12;
+
+   // this will be 0, the implementation should return 1/2 of the 
configured critical timeout
+   public static final long DEFAULT_ANALYZE_CHECK_PERIOD = 0;
--- End diff --

with the logic below now of timeout / 2, is this used?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117295#comment-16117295
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131764161
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalMeasure.java
 ---
@@ -0,0 +1,52 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import org.jboss.logging.Logger;
+
+public class CriticalMeasure {
+
+   private static final Logger logger = 
Logger.getLogger(CriticalMeasure.class);
+
+   private volatile long timeEnter;
+   private volatile long timeLeft;
+
+   public void enterCritical() {
+  timeEnter = System.currentTimeMillis();
+   }
+
+   public void leaveCritical() {
+  timeLeft = System.currentTimeMillis();
+   }
+
+   public boolean isExpired(long timeout) {
+  if (timeEnter > timeLeft) {
+ return System.currentTimeMillis() - timeEnter > timeout;
+  }
+
--- End diff --

nit: empty line.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117281#comment-16117281
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131763449
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117268#comment-16117268
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131762773
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalComponentImpl.java
 ---
@@ -0,0 +1,68 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.activemq.artemis.utils.critical;
+
+/**
+ * This is not abstract as it could be used through aggregations or 
extensions.
+ * This is only good for cases where you call leave within the same thread 
as you called enter.
+ */
+public class CriticalComponentImpl implements CriticalComponent {
+
+   private final CriticalMeasure[] measures;
+   private final CriticalAnalyzer analyzer;
+
+   public CriticalComponentImpl(CriticalAnalyzer analyzer, int 
numberOfPaths) {
+  if (analyzer == null) {
+ analyzer = EmptyCriticalAnalyzer.getInstance();
+  }
+  this.analyzer = analyzer;
+
+  if (analyzer.isMeasuring()) {
+ measures = new CriticalMeasure[numberOfPaths];
+ for (int i = 0; i < numberOfPaths; i++) {
+measures[i] = new CriticalMeasure();
+ }
+  } else {
+ measures = null;
+  }
+   }
+
+   @Override
+   public void enterCritical(int path) {
+  if (analyzer.isMeasuring()) {
+ measures[path].enterCritical();
+  }
+
--- End diff --

oops.. thx


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117256#comment-16117256
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131762073
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalComponentImpl.java
 ---
@@ -0,0 +1,68 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.activemq.artemis.utils.critical;
+
+/**
+ * This is not abstract as it could be used through aggregations or 
extensions.
+ * This is only good for cases where you call leave within the same thread 
as you called enter.
+ */
+public class CriticalComponentImpl implements CriticalComponent {
+
+   private final CriticalMeasure[] measures;
+   private final CriticalAnalyzer analyzer;
+
+   public CriticalComponentImpl(CriticalAnalyzer analyzer, int 
numberOfPaths) {
+  if (analyzer == null) {
+ analyzer = EmptyCriticalAnalyzer.getInstance();
+  }
+  this.analyzer = analyzer;
+
+  if (analyzer.isMeasuring()) {
+ measures = new CriticalMeasure[numberOfPaths];
+ for (int i = 0; i < numberOfPaths; i++) {
+measures[i] = new CriticalMeasure();
+ }
+  } else {
+ measures = null;
+  }
+   }
+
+   @Override
+   public void enterCritical(int path) {
+  if (analyzer.isMeasuring()) {
+ measures[path].enterCritical();
+  }
+
--- End diff --

nit: empty line


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116873#comment-16116873
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131711379
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as in 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115941#comment-16115941
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131552826
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+analyze-critical | Enable or disable the critical analysis (default true)
+analyze-critical-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+analyze-critical-check-period | Time used to check the response times 
(default half of analyze-critical-timeout)
+analyze-critical-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

i would imagine a health monitor/critical analysers threads are independent 
of core flow, as such it should detect and action if a component isn't 
responding or non healthy state. As such it should be able to declare that 
state back. 

e.g. if JVM bad state due to deadlock, that deadlock should only be 
affecting the threads/flow where the deadlock is , as such this component 
should still be servicing.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115940#comment-16115940
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131552744
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115796#comment-16115796
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131541817
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+analyze-critical | Enable or disable the critical analysis (default true)
+analyze-critical-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+analyze-critical-check-period | Time used to check the response times 
(default half of analyze-critical-timeout)
+analyze-critical-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

I can use management.  

I wasn't sure it would be useful. As if the JVM is in bad state would the 
message arrive ?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115797#comment-16115797
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on the issue:

https://github.com/apache/activemq-artemis/pull/1443
  
I will make some changes here tomorrow. 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115794#comment-16115794
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131541803
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as in 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115631#comment-16115631
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534319
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -2064,6 +2072,53 @@ public Configuration 
setNetworkCheckPing6Command(String command) {
   return this;
}
 
+   @Override
+   public boolean isAnalyzeCritical() {
+  return analyzeCritical;
+   }
+
+   @Override
+   public Configuration setAnalyzeCritical(boolean analyzeCritical) {
+  this.analyzeCritical = analyzeCritical;
+  return this;
+   }
+
+   @Override
+   public long getAnalyzeCriticalTimeout() {
+  return analyzeCriticalTimeout;
+   }
+
+   @Override
+   public Configuration setAnalyzeCriticalTimeout(long timeout) {
+  this.analyzeCriticalTimeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getAnalyzeCriticalCheckPeriod() {
+  if (analyzeCriticalCheckPeriod == 
ActiveMQDefaultConfiguration.getAnalyzeCriticalCheckPeriod()) {
--- End diff --

I was more getting at should the defaulting logic of using the timeout if 
the value is not set live in the config parser?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115630#comment-16115630
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534309
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
--- End diff --

Is this one of these values not being updated by one thread and read by 
another then? I may have miss understood the code.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115629#comment-16115629
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534303
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+analyze-critical | Enable or disable the critical analysis (default true)
+analyze-critical-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+analyze-critical-check-period | Time used to check the response times 
(default half of analyze-critical-timeout)
+analyze-critical-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

I was more saying:

how does an outside observer view the state. Eg many people rely on logs or 
JMX 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115627#comment-16115627
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131534288
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115579#comment-16115579
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532700
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+analyze-critical | Enable or disable the critical analysis (default true)
+analyze-critical-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+analyze-critical-check-period | Time used to check the response times 
(default half of analyze-critical-timeout)
+analyze-critical-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

The idea here was to add monitoring on internal objects similar to what we 
do on pings / pongs through the protocols. I didn't want to make it any more 
complex than needed.

This subject could be raised to become an independent project if you keep 
investing here.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115578#comment-16115578
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532685
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
 ---
@@ -477,12 +484,44 @@ public final synchronized void start() throws 
Exception {
   }
}
 
+   @Override
+   public CriticalAnalyzer getCriticalAnalyzer() {
+  return this.analyzer;
+   }
+
private void internalStart() throws Exception {
   if (state != SERVER_STATE.STOPPED) {
  logger.debug("Server already started!");
  return;
   }
 
+  if (!configuration.isAnalyzeCritical()) {
--- End diff --

I thought the configuration was being parsed inside ActiveMQServerImpl. 
(Bad memory, I forget about stuff months after working on stuff :) )... I will 
move it to constructor.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115577#comment-16115577
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532658
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
--- End diff --

I don't need it to be atomic really. as soon as the dead lock stops 
updating this.. it would be enough to me to just have a volatile here.


The only overhead is the call to System.currentTimeMillis(); I did some 
tests and didn't see any regressions.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115576#comment-16115576
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532629
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as in 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115575#comment-16115575
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532619
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/deployers/impl/FileConfigurationParser.java
 ---
@@ -608,6 +608,14 @@ public void parseMainConfig(final Element e, final 
Configuration config) throws
 
   config.setNetworkCheckPingCommand(getString(e, 
"network-check-ping-command", config.getNetworkCheckPingCommand(), 
Validators.NO_CHECK));
 
+  config.setAnalyzeCritical(getBoolean(e, "analyze-critical", 
config.isAnalyzeCritical()));
+
+  config.setAnalyzeCriticalTimeout(getLong(e, 
"analyze-critical-timeout", config.getAnalyzeCriticalTimeout(), 
Validators.GE_ZERO));
+
+  config.setAnalyzeCriticalCheckPeriod(getLong(e, 
"analyze-critical-check-period", config.getAnalyzeCriticalCheckPeriod(), 
Validators.GE_ZERO));
--- End diff --

I didn't want to use null here. On this case, check period=0 would mean not 
set.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115573#comment-16115573
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532611
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -2064,6 +2072,53 @@ public Configuration 
setNetworkCheckPing6Command(String command) {
   return this;
}
 
+   @Override
+   public boolean isAnalyzeCritical() {
+  return analyzeCritical;
+   }
+
+   @Override
+   public Configuration setAnalyzeCritical(boolean analyzeCritical) {
+  this.analyzeCritical = analyzeCritical;
+  return this;
+   }
+
+   @Override
+   public long getAnalyzeCriticalTimeout() {
+  return analyzeCriticalTimeout;
+   }
+
+   @Override
+   public Configuration setAnalyzeCriticalTimeout(long timeout) {
+  this.analyzeCriticalTimeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getAnalyzeCriticalCheckPeriod() {
+  if (analyzeCriticalCheckPeriod == 
ActiveMQDefaultConfiguration.getAnalyzeCriticalCheckPeriod()) {
--- End diff --

if 0, then the we would calculate the check-period.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115569#comment-16115569
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user clebertsuconic commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131532564
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+analyze-critical | Enable or disable the critical analysis (default true)
+analyze-critical-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+analyze-critical-check-period | Time used to check the response times 
(default half of analyze-critical-timeout)
+analyze-critical-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

Shutdown. Server.stop.  But there is no guarantee it would work on 
deadlocks. 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115531#comment-16115531
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530954
  
--- Diff: docs/user-manual/en/critical-analysis.md ---
@@ -0,0 +1,32 @@
+# Critical Analysis of the broker
+
+There are a few things that can go wrong on a production environment:
+
+- Bugs, for more than we try they still happen! We always try to correct 
them, but that's the only constant in software development.
+- IO Errors, disks and hardware can go bad
+- Memory issues, the CPU can go crazy by another process
+
+For cases like this, we added a protection to the broker to shut itself 
down when bad things happen.
+
+We measure time response in places like:
+
+- Queue delivery (add to the queue)
+- Journal storage
+- Paging operations
+
+If the response time goes beyond a configured timeout, the broker is 
considered unstable and an action will be taken to either shutdown the broker 
or halt the VM.
+
+You can use these following configuration options on broker.xml to 
configure how the critical analysis is performed.
+
+
+Name | Description
+:--- | :---
+analyze-critical | Enable or disable the critical analysis (default true)
+analyze-critical-timeout | Timeout used to do the critical analysis 
(default 12 milliseconds)
+analyze-critical-check-period | Time used to check the response times 
(default half of analyze-critical-timeout)
+analyze-critical-halt | Should the VM be halted upon failures (default 
false)
--- End diff --

what occurs if you have this to no halt? does it log out, also is it 
available as a jmx endpoint so users monitoring systems using the exposed jmx 
endpoints, could be configured to check and alert also to their operational 
support teams.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115528#comment-16115528
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530849
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
 ---
@@ -477,12 +484,44 @@ public final synchronized void start() throws 
Exception {
   }
}
 
+   @Override
+   public CriticalAnalyzer getCriticalAnalyzer() {
+  return this.analyzer;
+   }
+
private void internalStart() throws Exception {
   if (state != SERVER_STATE.STOPPED) {
  logger.debug("Server already started!");
  return;
   }
 
+  if (!configuration.isAnalyzeCritical()) {
--- End diff --

why not do this within object construction? 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115532#comment-16115532
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530890
  
--- Diff: 
artemis-server/src/test/resources/ConfigurationTest-full-config.xml ---
@@ -57,6 +57,10 @@
   1234567
   37
   123
+  true
--- End diff --

between documentation you name this Critical Analysis, then in config etc 
you switch round the words to analyse critical. suggest keeping naming 
constant. 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115529#comment-16115529
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530870
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
 ---
@@ -477,12 +484,44 @@ public final synchronized void start() throws 
Exception {
   }
}
 
+   @Override
+   public CriticalAnalyzer getCriticalAnalyzer() {
+  return this.analyzer;
+   }
+
private void internalStart() throws Exception {
   if (state != SERVER_STATE.STOPPED) {
  logger.debug("Server already started!");
  return;
   }
 
+  if (!configuration.isAnalyzeCritical()) {
+ this.analyzer = new EmptyCriticalAnalyzer();
+  }
+
+  /** Calling this for cases where the server was stopped and now is 
being restarted... failback, etc...*/
+  this.analyzer.clear();
+
+  
this.getCriticalAnalyzer().setCheckTime(configuration.getAnalyzeCriticalTimeout()).setTimeout(configuration.getAnalyzeCriticalTimeout());
--- End diff --

again why not do this in the constructor?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115530#comment-16115530
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530746
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
--- End diff --

if this is on hot path, why not use an atomic or have use 
atomicfieldupdater (same with check time) 

http://normanmaurer.me/blog/2013/10/28/Lesser-known-concurrent-classes-Part-1/



> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115525#comment-16115525
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/1443
  
What is the over head of this, latency and throughput? 


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115522#comment-16115522
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530423
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/deployers/impl/FileConfigurationParser.java
 ---
@@ -608,6 +608,14 @@ public void parseMainConfig(final Element e, final 
Configuration config) throws
 
   config.setNetworkCheckPingCommand(getString(e, 
"network-check-ping-command", config.getNetworkCheckPingCommand(), 
Validators.NO_CHECK));
 
+  config.setAnalyzeCritical(getBoolean(e, "analyze-critical", 
config.isAnalyzeCritical()));
+
+  config.setAnalyzeCriticalTimeout(getLong(e, 
"analyze-critical-timeout", config.getAnalyzeCriticalTimeout(), 
Validators.GE_ZERO));
+
+  config.setAnalyzeCriticalCheckPeriod(getLong(e, 
"analyze-critical-check-period", config.getAnalyzeCriticalCheckPeriod(), 
Validators.GE_ZERO));
--- End diff --

as per other review comment should the default not be half the timeout  if 
it is not set, not some arbitarty number which then forces the section of code 
later.


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115521#comment-16115521
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530496
  
--- Diff: 
artemis-commons/src/main/java/org/apache/activemq/artemis/utils/critical/CriticalAnalyzerImpl.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.activemq.artemis.utils.critical;
+
+import java.util.ConcurrentModificationException;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.activemq.artemis.utils.collections.ConcurrentHashSet;
+import org.jboss.logging.Logger;
+
+public class CriticalAnalyzerImpl implements CriticalAnalyzer {
+
+   private final Logger logger = Logger.getLogger(CriticalAnalyzer.class);
+
+   private volatile long timeout;
+
+   private volatile long checkTime;
+
+   @Override
+   public void clear() {
+  actions.clear();
+  components.clear();
+   }
+
+   private CopyOnWriteArrayList actions = new 
CopyOnWriteArrayList<>();
+
+   private Thread thread;
+
+   private final Semaphore running = new Semaphore(1);
+
+   private final ConcurrentHashSet components = new 
ConcurrentHashSet<>();
+
+   @Override
+   public void add(CriticalComponent component) {
+  components.add(component);
+   }
+
+   @Override
+   public void remove(CriticalComponent component) {
+  components.remove(component);
+   }
+
+   @Override
+   public CriticalAnalyzer setCheckTime(long timeout) {
+  this.checkTime = timeout;
+  return this;
+   }
+
+   @Override
+   public long getCheckTime() {
+  if (checkTime == 0) {
+ checkTime = getTimeout() / 2;
+  }
+  return checkTime;
+   }
+
+   @Override
+   public CriticalAnalyzer setTimeout(long timeout) {
+  this.timeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getTimeout() {
+  if (timeout == 0) {
+ timeout = TimeUnit.MINUTES.toMillis(2);
+  }
+  return timeout;
+   }
+
+   @Override
+   public CriticalAnalyzer addAction(CriticalAction action) {
+  this.actions.add(action);
+  return this;
+   }
+
+   @Override
+   public void check() {
+  boolean retry = true;
+  while (retry) {
+ try {
+for (CriticalComponent component : components) {
+
+   int pathReturned = component.isExpired(timeout);
+   if (pathReturned >= 0) {
+  fireAction(component, pathReturned);
+  // no need to keep running if there's already a 
component failed
+  return;
+   }
+}
+retry = false; // got to the end of the list, no need to retry
+ } catch (ConcurrentModificationException dontCare) {
+// lets retry on the loop
+ }
+  }
+   }
+
+   private void fireAction(CriticalComponent component, int path) {
+  for (CriticalAction action: actions) {
+ try {
+action.run(component, path);
+ } catch (Throwable e) {
+logger.warn(e.getMessage(), e);
+ }
+  }
+   }
+
+   @Override
+   public void start() {
+
+  if (!running.tryAcquire()) {
+ // already running
+ return;
+  }
+
+  // we are not using any Thread Pool or any Scheduled Executors from 
the ArtemisServer
+  // as that would defeat the purpose,
+  // as 

[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115523#comment-16115523
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1443#discussion_r131530398
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/impl/ConfigurationImpl.java
 ---
@@ -2064,6 +2072,53 @@ public Configuration 
setNetworkCheckPing6Command(String command) {
   return this;
}
 
+   @Override
+   public boolean isAnalyzeCritical() {
+  return analyzeCritical;
+   }
+
+   @Override
+   public Configuration setAnalyzeCritical(boolean analyzeCritical) {
+  this.analyzeCritical = analyzeCritical;
+  return this;
+   }
+
+   @Override
+   public long getAnalyzeCriticalTimeout() {
+  return analyzeCriticalTimeout;
+   }
+
+   @Override
+   public Configuration setAnalyzeCriticalTimeout(long timeout) {
+  this.analyzeCriticalTimeout = timeout;
+  return this;
+   }
+
+   @Override
+   public long getAnalyzeCriticalCheckPeriod() {
+  if (analyzeCriticalCheckPeriod == 
ActiveMQDefaultConfiguration.getAnalyzeCriticalCheckPeriod()) {
--- End diff --

how would you differentiate between it being the default and someone 
actually have set it, should this defaulting not occur in config/broker.xml 
reading where if null then analyzeCriticalCheckPeriod is defaulted to half 
timeout?


> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARTEMIS-1324) Critical Analysis and deadlock detection on broker

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115278#comment-16115278
 ] 

ASF GitHub Bot commented on ARTEMIS-1324:
-

GitHub user clebertsuconic opened a pull request:

https://github.com/apache/activemq-artemis/pull/1443

ARTEMIS-1324 Deadlock detection and health check of critical components



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clebertsuconic/activemq-artemis 
critical-prototype

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq-artemis/pull/1443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1443


commit 58474c42b74d31e4727d4da58553c01533d4f4e6
Author: Clebert Suconic 
Date:   2017-08-05T04:52:42Z

ARTEMIS-1324 Deadlock detection and health check of critical components




> Critical Analysis and deadlock detection on broker
> --
>
> Key: ARTEMIS-1324
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1324
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: clebert suconic
>Assignee: clebert suconic
>Priority: Critical
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)