Re: Possible deadlock with MQTT

Takeshi Fukushima Thu, 20 Jan 2022 11:31:36 -0800

Hi Justin

I realize that my fix is not the best, I just don't know the codebase well
enough to do a better job. That said, the methods that I've marked as
synchronized (MQTTConnectionManager connect and disconnect) already
synchronize themselves on the session state, but the disconnect method
synchronizes on the sessionState, which might get swapped before it has a
chance to call (synchronized) MQTTSession.stop (which in turn, calls the
synchronized MQTTSession.clear but on a different instance of the session).
On top of that, since a freshly created MQTTSession uses a global
MQTTSessionState to begin with (and some methods synchronizes over the
state), I think this causes more (unintended) contention and may be another
source of deadlocks.


I've looked at your PR and, unfortunately, the issue still remains as far
as I can tell. I will be happy to help if you need anything and I will do
my best to create a testcase that can at least sometimes reproduce the
problem (which unfortunately depends all too much on the thread scheduling
order).

Thanks for the response,
takeshi

On Thu, Jan 20, 2022 at 4:09 PM Justin Bertram <jbert...@apache.org> wrote:

> Making a method synchronized typically isn't the preferred solution to
> concurrency issues due to the performance penalty it typically imposes.
> Generally targeted locks are better. However, concurrency issues are often
> quite difficult to diagnose and resolve so sometimes a less-than-ideal
> solution is better than broken code.
>
> I currently have a large PR [1] for MQTT 5 support that is yet to be
> merged. Once that is merged I'll take a closer look at the deadlock issue
> from ARTEMIS-3622 that you reported.
>
>
> Justin
>
> [1] https://github.com/apache/activemq-artemis/pull/3907
>
> On Thu, Jan 20, 2022 at 11:08 AM Takeshi Fukushima <takesh...@gmail.com>
> wrote:
>
> > Hello everyone
> >
> > Last month I've opened up a ticket regarding a deadlock that was
> occurring
> > on one of our servers.
> > https://issues.apache.org/jira/browse/ARTEMIS-3622
> >
> > After much trial and error, I think I've narrowed it down the cause and
> > it's in the comments, but being a racing condition I cannot reliably
> > reproduce in a test case. I can reproduce manually (carefully advancing
> the
> > threads in the debugger).
> >
> > I also did patch our server with the quickfix indicated in the comments
> and
> > I will let you know if that fixes the problem (so far so good, but its
> been
> > only about 18 hours). If everything goes well, I will create a PR but I
> > would like to know if a simple solution like that (marking a couple of
> > methods as synchronized) would be acceptable, even though I think it
> > doesn't fixes all the cases (that one would require much in depth
> changes I
> > think).
> >
> > Thanks for the great software anyway,
> > takeshi
> >
> > --
> > http://mapsdev.blogspot.com/
> > Marcelo Takeshi Fukushima
> >
>


-- 
http://mapsdev.blogspot.com/
Marcelo Takeshi Fukushima

Re: Possible deadlock with MQTT

Reply via email to