[jira] [Commented] (AMQ-5540) KahaDB can't fail over to the slave if the master is unable to write to disk

Jason Gantner (Jira) Wed, 14 Apr 2021 02:48:04 -0700


    [ 
https://issues.apache.org/jira/browse/AMQ-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320860#comment-17320860
 ]


Jason Gantner commented on AMQ-5540:
------------------------------------

This issue is still present as of version 5.15.13 with the same behaviour.
A single I/O failure triggers a shutdown but the process (deadlocks|doesn't 
finish the routine) because KahaDB has a missing PageFile (from previous the 
I/O error).
We end up with a "frozen" master still actively locking the DB and a waiting 
slave waiting for the lock to be released.
A manual `activemq restart` solves the problem, but we loose the quick reaction 
time offered by the HA mode.

> KahaDB can't fail over to the slave if the master is unable to write to disk
> ----------------------------------------------------------------------------
>
>                 Key: AMQ-5540
>                 URL: https://issues.apache.org/jira/browse/AMQ-5540
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Message Store
>    Affects Versions: 5.10.0
>         Environment: Using Master-slave topology with shared kahadb. 
> Using KahaDB on NFS. 
>            Reporter: Anuj Khandelwal
>            Priority: Major
>         Attachments: ActiveMQ_config.xml, Logs.txt
>
>
> This is coming from 
> http://activemq.2283324.n4.nabble.com/kahadb-corruption-quot-Checkpoint-failed-java-io-IOException-Input-output-error-quot-td4690378.html#a4690442
>  . 
> Scenario : We had some failure on filer because of which applications 
> (ActiveMQ) was not able to read/write on kahadb. I have attached the logs to 
> see the details. Master broker was not completely killed. Master has stopped 
> it's transport connectors and plugins but it didn't release it's lock from 
> the kahadb. I have checked from "ps" command that master broker was running. 
> And since master didn't release the lock on kahadb, slave broker was not able 
> to acquire the lock. 
> Master broker should shutdown properly in such cases and let the slave take 
> over the persistence store. 
> Thanks,
> Anuj



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (AMQ-5540) KahaDB can't fail over to the slave if the master is unable to write to disk

Reply via email to