[jira] [Commented] (AMQ-9482) Broker crashes after runaway threads spawn

2024-05-19 Thread Christopher L. Shannon (Jira)


[ 
https://issues.apache.org/jira/browse/AMQ-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847737#comment-17847737
 ] 

Christopher L. Shannon commented on AMQ-9482:
-

It seems like maybe one of the connections is stuck and not reading (although 
there should be a timeout). I also wonder if the burst of connections just 
happens so quickly it exhausts the memory, if the broker runs out of memory 
then it will just get into a weird state so you may need to bump memory to 
handle the spike.

You could try tweaking some settings with the transport:

[https://activemq.apache.org/components/classic/documentation/tcp-transport-reference]

I would try setting the {{soTimeout}} and {{soWriteTimeout}} which would 
hopefully prevent issues with connections blocking forever on a read/write to a 
socket.

You could also try tuning things for the SelectorManager: 
https://activemq.apache.org/components/classic/documentation/nio-transport-reference

 

> Broker crashes after runaway threads spawn
> --
>
> Key: AMQ-9482
> URL: https://issues.apache.org/jira/browse/AMQ-9482
> Project: ActiveMQ Classic
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.17.6, 6.0.1
> Environment: Bitnami created AMI in AWS
>Reporter: Tom Tichy
>Priority: Major
> Attachments: activemq.tdump, brokerInfo-after-crash-redacted.json
>
>
> Running on Bitnami created AMI in AWS. The broker has about 7000 devices 
> connected via MQTT. Each devices has its own topic name.
> Broker stays up for about 4-5 days before being hobbled and unable to create 
> any new tasks/accept any new connections.
> (There is identical setup for staging environment with about 100 devices 
> connected. It runs without any issues.)
> I have troubleshot the cause to be the systemd task limit. The current 
> `TasksMax` is 18100. When running normally, the number of tasks is about 300. 
> Then (every 4-5 days) there is a quick spike to the max 18100 tasks and it 
> stays there never coming back down. The result is that the broker just sits 
> there, does nothing useful and keeps logging the following message
>  
> {code:java}
> [659914.788s][warning][os,thread] Failed to start thread "Unknown thread" - 
> pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, g
> uardsize: 0k, detached.
> [659914.788s][warning][os,thread] Failed to start the native thread for 
> java.lang.Thread "ActiveMQ BrokerService[localhost] Task-281805"
> ERROR | Scheduled task error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.lang.Thread.start0(Native Method) ~[?:?]
>         at java.lang.Thread.start(Thread.java:809) ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>  ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) 
> ~[?:?]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820) 
> ~[activemq-broker-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39) 
> ~[activemq-client-6.0.1.jar:6.0.1]
>         at java.util.TimerThread.mainLoop(Timer.java:566) ~[?:?]
>         at java.util.TimerThread.run(Timer.java:516) ~[?:?]
> Exception in thread "ActiveMQ Broker[localhost] Scheduler" 
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.base/java.lang.Thread.start0(Native Method)
>         at java.base/java.lang.Thread.start(Thread.java:809)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820)
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39)
>         at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
>         at java.base/java.util.TimerThread.run(Timer.java:516)
>  {code}
>  
> The start command is 
> {code:java}
> /opt/bitnami/java/bin/java -Xms2G -Xmx4G 
> -Djava.util.logging.config.file=logging.properties 
> 

[jira] [Commented] (AMQ-9482) Broker crashes after runaway threads spawn

2024-05-17 Thread Tom Tichy (Jira)


[ 
https://issues.apache.org/jira/browse/AMQ-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847223#comment-17847223
 ] 

Tom Tichy commented on AMQ-9482:


Here is a thread dump that I managed to get. ActiveMQ was totally out to lunch 
and reporting nothing but

{code:java}
WARN | Transport Connection to: tcp://10.0.88.139:35854 failed: CONNECT frame 
not received with in connectionTimeout (>3): tcp://10.0.88.139:35854 {code}
Fastthread.io analyzer came up with this

 # 'Selector Worker: 12' thread is stuck on *_wait()_* method in 
*_sun.nio.ch.EPoll_* file. Before getting stuck, this thread obtained *5 locks* 
(sun.nio.ch.Util$2 lock, sun.nio.ch.EPollSelectorImpl lock...) and never 
released it. Due to that *4480 threads are* BLOCKED as shown in this [+stack 
trace+|https://fastthread.io/thread.jsp?tId=MHg0ODMwZQ===3=1=true=2024-05-17T10-07-42].
 If threads are BLOCKED for a prolonged period, your application can become 
unresponsive.

Any ideas [~cshannon] ?

[^activemq.tdump]

> Broker crashes after runaway threads spawn
> --
>
> Key: AMQ-9482
> URL: https://issues.apache.org/jira/browse/AMQ-9482
> Project: ActiveMQ Classic
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.17.6, 6.0.1
> Environment: Bitnami created AMI in AWS
>Reporter: Tom Tichy
>Priority: Major
> Attachments: activemq.tdump, brokerInfo-after-crash-redacted.json
>
>
> Running on Bitnami created AMI in AWS. The broker has about 7000 devices 
> connected via MQTT. Each devices has its own topic name.
> Broker stays up for about 4-5 days before being hobbled and unable to create 
> any new tasks/accept any new connections.
> (There is identical setup for staging environment with about 100 devices 
> connected. It runs without any issues.)
> I have troubleshot the cause to be the systemd task limit. The current 
> `TasksMax` is 18100. When running normally, the number of tasks is about 300. 
> Then (every 4-5 days) there is a quick spike to the max 18100 tasks and it 
> stays there never coming back down. The result is that the broker just sits 
> there, does nothing useful and keeps logging the following message
>  
> {code:java}
> [659914.788s][warning][os,thread] Failed to start thread "Unknown thread" - 
> pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, g
> uardsize: 0k, detached.
> [659914.788s][warning][os,thread] Failed to start the native thread for 
> java.lang.Thread "ActiveMQ BrokerService[localhost] Task-281805"
> ERROR | Scheduled task error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.lang.Thread.start0(Native Method) ~[?:?]
>         at java.lang.Thread.start(Thread.java:809) ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>  ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) 
> ~[?:?]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820) 
> ~[activemq-broker-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39) 
> ~[activemq-client-6.0.1.jar:6.0.1]
>         at java.util.TimerThread.mainLoop(Timer.java:566) ~[?:?]
>         at java.util.TimerThread.run(Timer.java:516) ~[?:?]
> Exception in thread "ActiveMQ Broker[localhost] Scheduler" 
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.base/java.lang.Thread.start0(Native Method)
>         at java.base/java.lang.Thread.start(Thread.java:809)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820)
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39)
>         at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
>         at java.base/java.util.TimerThread.run(Timer.java:516)
>  {code}
>  
> The start command is 
> {code:java}
> /opt/bitnami/java/bin/java -Xms2G -Xmx4G 
> -Djava.util.logging.config.file=logging.properties 
> 

[jira] [Commented] (AMQ-9482) Broker crashes after runaway threads spawn

2024-04-29 Thread Tom Tichy (Jira)


[ 
https://issues.apache.org/jira/browse/AMQ-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841923#comment-17841923
 ] 

Tom Tichy commented on AMQ-9482:


Ok. Let me get a thread dump. The challenge is that since this is a production, 
I deal with the issue by immediately restarting the broker and switching to the 
secondary.

> Broker crashes after runaway threads spawn
> --
>
> Key: AMQ-9482
> URL: https://issues.apache.org/jira/browse/AMQ-9482
> Project: ActiveMQ Classic
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.17.6, 6.0.1
> Environment: Bitnami created AMI in AWS
>Reporter: Tom Tichy
>Priority: Major
> Attachments: brokerInfo-after-crash-redacted.json
>
>
> Running on Bitnami created AMI in AWS. The broker has about 7000 devices 
> connected via MQTT. Each devices has its own topic name.
> Broker stays up for about 4-5 days before being hobbled and unable to create 
> any new tasks/accept any new connections.
> (There is identical setup for staging environment with about 100 devices 
> connected. It runs without any issues.)
> I have troubleshot the cause to be the systemd task limit. The current 
> `TasksMax` is 18100. When running normally, the number of tasks is about 300. 
> Then (every 4-5 days) there is a quick spike to the max 18100 tasks and it 
> stays there never coming back down. The result is that the broker just sits 
> there, does nothing useful and keeps logging the following message
>  
> {code:java}
> [659914.788s][warning][os,thread] Failed to start thread "Unknown thread" - 
> pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, g
> uardsize: 0k, detached.
> [659914.788s][warning][os,thread] Failed to start the native thread for 
> java.lang.Thread "ActiveMQ BrokerService[localhost] Task-281805"
> ERROR | Scheduled task error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.lang.Thread.start0(Native Method) ~[?:?]
>         at java.lang.Thread.start(Thread.java:809) ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>  ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) 
> ~[?:?]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820) 
> ~[activemq-broker-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39) 
> ~[activemq-client-6.0.1.jar:6.0.1]
>         at java.util.TimerThread.mainLoop(Timer.java:566) ~[?:?]
>         at java.util.TimerThread.run(Timer.java:516) ~[?:?]
> Exception in thread "ActiveMQ Broker[localhost] Scheduler" 
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.base/java.lang.Thread.start0(Native Method)
>         at java.base/java.lang.Thread.start(Thread.java:809)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820)
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39)
>         at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
>         at java.base/java.util.TimerThread.run(Timer.java:516)
>  {code}
>  
> The start command is 
> {code:java}
> /opt/bitnami/java/bin/java -Xms2G -Xmx4G 
> -Djava.util.logging.config.file=logging.properties 
> -Djava.security.auth.login.config=/opt/bitnami/activemq/conf/login.config 
> -Dorg.apache.activemq.UseDedicatedTaskRunner=false 
> -Dcom.sun.management.jmxremote -Djava.awt.headless=true 
> -Djava.io.tmpdir=/opt/bitnami/activemq/tmp --add-reads=java.xml=java.logging 
> --add-opens java.base/java.security=ALL-UNNAMED --add-opens 
> java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED 
> --add-opens java.base/java.util=ALL-UNNAMED --add-opens 
> java.naming/javax.naming.spi=ALL-UNNAMED --add-opens 
> java.rmi/sun.rmi.transport.tcp=ALL-UNNAMED --add-opens 
> java.base/java.util.concurrent=ALL-UNNAMED --add-opens 
> java.base/java.util.concurrent.atomic=ALL-UNNAMED 
> 

[jira] [Commented] (AMQ-9482) Broker crashes after runaway threads spawn

2024-04-21 Thread Christopher L. Shannon (Jira)


[ 
https://issues.apache.org/jira/browse/AMQ-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839394#comment-17839394
 ] 

Christopher L. Shannon commented on AMQ-9482:
-

To look at something like this more information is needed such as a thread dump 
to to see what is going on. From the description it seems like a bunch of tasks 
are running and not completing which makes me think there might be a deadlock 
or something else blocking the tasks, but without a thread dump showing what 
all the tasks are doing it's impossible to tell.

> Broker crashes after runaway threads spawn
> --
>
> Key: AMQ-9482
> URL: https://issues.apache.org/jira/browse/AMQ-9482
> Project: ActiveMQ Classic
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.17.6, 6.0.1
> Environment: Bitnami created AMI in AWS
>Reporter: Tom Tichy
>Priority: Major
> Attachments: brokerInfo-after-crash-redacted.json
>
>
> Running on Bitnami created AMI in AWS. The broker has about 7000 devices 
> connected via MQTT. Each devices has its own topic name.
> Broker stays up for about 4-5 days before being hobbled and unable to create 
> any new tasks/accept any new connections.
> (There is identical setup for staging environment with about 100 devices 
> connected. It runs without any issues.)
> I have troubleshot the cause to be the systemd task limit. The current 
> `TasksMax` is 18100. When running normally, the number of tasks is about 300. 
> Then (every 4-5 days) there is a quick spike to the max 18100 tasks and it 
> stays there never coming back down. The result is that the broker just sits 
> there, does nothing useful and keeps logging the following message
>  
> {code:java}
> [659914.788s][warning][os,thread] Failed to start thread "Unknown thread" - 
> pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, g
> uardsize: 0k, detached.
> [659914.788s][warning][os,thread] Failed to start the native thread for 
> java.lang.Thread "ActiveMQ BrokerService[localhost] Task-281805"
> ERROR | Scheduled task error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.lang.Thread.start0(Native Method) ~[?:?]
>         at java.lang.Thread.start(Thread.java:809) ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>  ~[?:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) 
> ~[?:?]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>  ~[activemq-client-6.0.1.jar:6.0.1]
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820) 
> ~[activemq-broker-6.0.1.jar:6.0.1]
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39) 
> ~[activemq-client-6.0.1.jar:6.0.1]
>         at java.util.TimerThread.mainLoop(Timer.java:566) ~[?:?]
>         at java.util.TimerThread.run(Timer.java:516) ~[?:?]
> Exception in thread "ActiveMQ Broker[localhost] Scheduler" 
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>         at java.base/java.lang.Thread.start0(Native Method)
>         at java.base/java.lang.Thread.start(Thread.java:809)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
>         at 
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
>         at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820)
>         at 
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39)
>         at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
>         at java.base/java.util.TimerThread.run(Timer.java:516)
>  {code}
>  
> The start command is 
> {code:java}
> /opt/bitnami/java/bin/java -Xms2G -Xmx4G 
> -Djava.util.logging.config.file=logging.properties 
> -Djava.security.auth.login.config=/opt/bitnami/activemq/conf/login.config 
> -Dorg.apache.activemq.UseDedicatedTaskRunner=false 
> -Dcom.sun.management.jmxremote -Djava.awt.headless=true 
> -Djava.io.tmpdir=/opt/bitnami/activemq/tmp --add-reads=java.xml=java.logging 
> --add-opens java.base/java.security=ALL-UNNAMED --add-opens 
> java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED 
> --add-opens java.base/java.util=ALL-UNNAMED --add-opens 
>