RE: [EXTERNAL] Re: Artemis file locking not released

Gunawan, Rahman (GSFC-703.H)[Halvik Corp] Tue, 01 Mar 2022 09:03:46 -0800

Attached is Artemis v2.19.1 log when it was terminated.  Should the server have 
gone to sleep when it lost connection to NFS/network and woke up when the 
server recovered the connection to NFS/network? 
In the replication mode, the server went to sleep when it lost access to 
network, then woke up from sleep when it recovered access to the network.


-----Original Message-----
From: Justin Bertram <jbert...@apache.org> 
Sent: Monday, February 28, 2022 1:49 PM
To: users@activemq.apache.org
Subject: Re: [EXTERNAL] Re: Artemis file locking not released

> Why was the primary server completely down when it was isolated from 
> the
network?

I can't really say since you've not really provided any details about this.
However, I would guess that since the journal is on NFS and since you killed 
the broker's network then it encountered a critical IO error and shut itself 
down. This is the expected behavior.

> I configured <network-check-list>, enabled , 
> <network-check-ping-command>
and <network-check-ping6-command> so the primary server knew that the network 
was unhealthy as shown in below log...

I've not seen the network pinger enabled for a shared-store configuration as it 
was explicitly designed for the replicated (i.e. shared nothing) configuration 
to avoid split-brain. In the shared-store configuration the shared-store itself 
mitigates against split-brain (e.g. via file locks). I don't believe you need 
to configure the network pinger given your use of shared-store.


Justin

On Mon, Feb 28, 2022 at 11:34 AM Gunawan, Rahman (GSFC-703.H)[Halvik Corp] 
<rahman.guna...@nasa.gov.invalid> wrote:

> We'll take a look at the NFS configuration.  Why was the primary 
> server completely down when it was isolated from the network?  I 
> configured <network-check-list>, enabled , 
> <network-check-ping-command> and <network-check-ping6-command> so the 
> primary server knew that the network was unhealthy as shown in below log:
> [org.apache.activemq.artemis.logs] AMQ201001: Network is unhealthy, 
> stopping service ActiveMQServerImpl
>
> However; when we enabled back the network card, the primary server was 
> completely down.  I had to start the primary server manually.
>
> Regards,
> Rahman
>
> -----Original Message-----
> From: Justin Bertram <jbert...@apache.org>
> Sent: Monday, February 28, 2022 10:15 AM
> To: users@activemq.apache.org
> Subject: Re: [EXTERNAL] Re: Artemis file locking not released
>
> The backup and the live do have a direct connection. This allows the 
> backup to share its connection details with the live. The live then 
> takes those details and passes them on to clients so that the clients 
> will know where to connect in case the live fails.
>
> However, if this connection breaks it is *not* possible for the backup 
> to simply "unlock" the journal and take over. The only entities which 
> can unlock the journal is the live broker (who created the lock in the 
> first
> place) or NFS itself (e.g. in the case of some kind of connectivity 
> failure). If the lock is not being released when the live broker's NFS 
> connectivity fails then I would suggest you have a problem with your 
> NFS configuration.
>
>
> Justin
>
> On Mon, Feb 28, 2022 at 6:55 AM Gunawan, Rahman (GSFC-703.H)[Halvik 
> Corp] < rahman.guna...@nasa.gov.invalid> wrote:
>
> > The backup server knew that the primary server had problem.  Below 
> > is from the log from the backup server:
> > ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to 
> > create netty connection: java.net.UnknownHostException
> >
> > Thus, I'm thinking if the Artemis primary server lost connection to 
> > NFS or network, the backup server can detect, unlock the file and 
> > take
> over.
> > Please let me know if you have suggestions.
> > Thanks
> >
> > Regards,
> > Rahman
> >
> > -----Original Message-----
> > From: Clebert Suconic <clebert.suco...@gmail.com>
> > Sent: Saturday, February 26, 2022 9:27 AM
> > To: users@activemq.apache.org
> > Subject: [EXTERNAL] Re: Artemis file locking not released
> >
> > Could be some configuration on the remote file system attributes ?
> >
> > On Fri, Feb 25, 2022 at 12:03 PM Gunawan, Rahman (GSFC-703.H)[Halvik 
> > Corp] <rahman.guna...@nasa.gov.invalid> wrote:
> >
> > > I'm using Artemis 2.19.1.  I'm using share file configuration and 
> > > testing a scenario where the primary Artemis server is isolated 
> > > from the network by disabling the network card.  Because the 
> > > primary server lost communication to NFS, the file is never unlock 
> > > and the backup server is always waiting for the lock.  When we 
> > > enable the network card in primary server, the primary server is 
> > > completely down.  Below is
> > the primary server log:
> > > "Reference Handler" Id=2 WAITING on
> java.lang.ref.Reference$Lock@64b6b3fc
> > >         at java.lang.Object.wait(Native Method)
> > >         -  waiting on java.lang.ref.Reference$Lock@64b6b3fc
> > >         at java.lang.Object.wait(Object.java:502)
> > >         at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> > >         at
> > > java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
> > >
> > >
> > >
> > > ==================================================================
> > > ==
> > > ==
> > > =========
> > > End Thread dump
> > >
> > > Is this bugs in Artemis share file configuration?
> > >
> > > Regards,
> > > Rahman
> > >
> > --
> > Clebert Suconic
> >
>

RE: [EXTERNAL] Re: Artemis file locking not released

Reply via email to